Top Banner
Attribute data handling in a GIS environment.
50

Attribute data handling in a GIS environment.blogs.ubc.ca/advancedgis/files/2019/11/Lecture13DBMS.pdf · 2019. 11. 21. · Data investigation: consider the type, quantity and qualities

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Attribute data handling in a GIS environment.

  • Outline

    Linking data to place

    Definitions

    Characteristics of DBMS

    Types of database

    Relational model

    SQL

    Database design

  • Linking data to place

    Having previously defined how geography can be modeled within a GIS, we also need to consider how the characteristics (or attributes) of the geographic features are associated with that geography.

  • GIS DataAttribute linkages

    Spatial data

    P,L,A,Px

    Attribute data

    NOIR

    x

    y

    points lines areas

  • Storing attribute data

    Attribute data are traditionally stored separately from the coordinate data

    (Can you suggest why?)

    Feature identifiers enable a link to be made to an attribute table:point attribute tableline or arc attribute tablepolygon attribute table

    # of coordinates

  • 1 2

    3

    Polygon Attribute Table

    The Polygon ID would be assigned by the GISoftware (when importing / creating the file), while the Feature ID would be assigned to each polygon by the user and used to link to existing databases.

    Similarly we could create point or line linkages.

    Storing attribute data

    Polygon ID Feature ID

    1 A2343

    2 A2348

    3 A4523

  • Storing attribute dataGood organization of the attribute data is very important

    In socioeconomic GIS applications, the attribute data component is often much larger than the spatial component (i.e., relatively few census tracts, but hundreds of variables)

  • ID District Province101 Palma Merida102 S. Maria Merida103 Veralo Merida104 Bolo La Paz105 Jose La Paz106 Malabo La Paz107 Chilabo La Paz… … …

    101102

    103

    104105

    107106

    P_Pop P_TFR Province397881 3.7 La Paz214084 3.2 Merida

    … … …

    Primary Key Foreign Key

    Sheet1

    IDDistrictProvince

    101PalmaMerida

    102S. MariaMerida

    103VeraloMerida

    104BoloLa Paz

    105JoseLa Paz

    106MalaboLa Paz

    107ChilaboLa Paz

    ………

    &A

    Page &P

    Sheet: Sheet1

    P_Pop

    P_TFR

    Province

    La Paz

    214084.0

    3.1666666666666665

    Merida

  • Outline

    Linking data to place

    Definitions

    Characteristics of DBMS

    Types of database

    Relational model

    SQL

    Database design

  • Definitions

    Database – an integrated set of data on a particular subject

    Geographic (=spatial) database - database containing geographic data of a particular subject for a particular area

    Database Management System (DBMS) – software to create, maintain and access databases

  • Storing dataThere are two fundamental ways to store data:

    As a simple file (e.g., a txt file) (example?)In a ‘database’

  • Simple file structures

  • Moving from files

    todatabases

    http://cs1.mcm.edu/~rob/class/dbms/notes/Chapt01/index.html

  • Advantages of DBs over files

    Avoids redundancy and duplicationReduces data maintenance costsApplications are separated from the data

    Applications persist over timeSupport multiple concurrent applications

    Better data sharingSecurity and standards can be defined and enforced

    ArcGIS topology rulesArcGIS attribute rules

    http://desktop.arcgis.com/en/arcmap/latest/manage-data/editing-topology/geodatabase-topology-rules-and-topology-error-fixes.htmhttp://desktop.arcgis.com/en/arcmap/latest/manage-data/editing-attributes/about-maintaining-attribute-integrity-while-editing.htm

  • Historical disadvantages of DBs over files

    Expense (e.g., an Oracle license would cost over $50k for a large commercial enterprise) (Although there are free / OS DBMS’s available such as MySQL, Postgre and MongoDB.)

    Complexity – not an issue with current DBMSs, although with modern data sets you still must consider how best to organize your data.

    Performance – generally not an issue with the current generation of DBMSs, although, again, with the terabyte-sized data sets available today this is becoming an issue once again.

    Integration with other systems –not an issue with the current generation of DBMSs

    Data volume—with massive data volumes processing slows down (this is where NoSQL, MongoDB and Hadoop shine)

    https://www.mysql.com/https://www.postgresql.org/https://www.mongodb.com/

  • Characteristics of DBMS (1)

    Data model support for multiple data types

    MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No (Logical), BLOBs, OLE Object, Hyperlink, Lookup Wizard (other DBMS’s are similar)

    Load data from files, databases and other applications

    Indexes for rapid retrieval

    http://www.webopedia.com/TERM/B/BLOB.htmlhttp://www.webopedia.com/TERM/O/OLE_DB.htmlhttp://en.wikipedia.org/wiki/Database_index

  • Characteristics of DBMS (2)

    Query language – SQL (also QBE, …) (also Spatial SQL)

    Security – controlled access to dataMulti-level groups

    Controlled update using a transaction manager (think ATM)

    Ensure attribute consistency (domain rules)

    Backup and recovery

    DBA toolsConfiguration, tuning

    http://www.w3schools.com/sql/http://desktop.arcgis.com/en/arcmap/latest/manage-data/geodatabases/an-overview-of-attribute-domains.htmhttp://www.databaseanswers.org/what_is_a_dba.htm

  • Characteristics of DBMS (3)Applications

    CASE tools (Computer-aided software engineering)Forms builderReportwriterInternet Application Server

    Programmable API (Application Programming Interface)(Data abstraction layer)

    http://en.wikipedia.org/wiki/Database_abstraction_layer

  • Role of DBMS

    StorageIndexingSecurityQuery

    Data entryEditing

    VisualizationMappingAnalysis

    System

    GIS

    DBMS

    Data

    Spatial datamanipulation

    Spatial and attribute data

  • OutlineLinking data to place

    Definitions

    Characteristics of DBMS

    Types of DBMS models

    Relational model

    SQL

    Google Street View

  • Types of DBMS ModelsHierarchical

    Network

    Relational - RDBMS

    Object-oriented - OODBMS

    Object-relational – ORDBMS

    Newer forms: NoSQL, NewSQL

  • Country

    Province

    County County

    Province

    Hierarchical

  • Hierarchical and Network

    Hierarchical

    Network

  • Node X YI 1 4II 4 4III 6 4IV 4 1

    Line From To Left Right1 I III O A2 I IV B O3 III IV O C4 I II A B5 II III A C6 II IV C B

    Poly LinesA 1,4,5B 2,4,6C 3,5,6

    1

    1

    5

    4

    3

    2

    6

    2 3 4 5 6

    A

    B C

    1

    2 3

    4 5

    6

    III

    III

    IV

    O = “outside” polygon

    Relational Tables: Topological data model

    Keys

    Foreign keys

    Sheet1

    NodeXY

    I14

    II44

    III64

    IV41

    &A

    Page &P

    Sheet1

    LineFromToLeftRight

    1IIIIOA

    2IIVBO

    3IIIIVOC

    4IIIAB

    5IIIIIAC

    6IIIVCB

    &A

    Page &P

    Sheet: Sheet1

    Poly

    Lines

    A

    1,4,5

    B

    2,4,6

    C

    3,5,6

  • Object-oriented DBMS

    Inheritance, encapsulation

    GE Smallworld

    BC Hydro

    http://en.wikipedia.org/wiki/Smallworldhttp://www.thefreelibrary.com/Smallworld,+BC+Hydro+and+Westech+Information+Collaborate+to+Develop...-a054570625

  • Overview

    Network essentially a programmer's database model efficient but inflexible and hard to understand

    Relational (ESRI Geodatabase)its only complex data type is the relationit is the only complete data model aimed at users instead of programmers relational query languages are easier to use than full-blown programming

    languages rich underlying theory separation of implementation and design

    Object-Oriented (GE Smallworld)an extension of object-oriented programming no generally agreed upon formal data model great freedom regarding complex data structures inheritance user-defined types encapsulation

  • Relational vs non-relation DBMSSQL DatabasesAlso known as relational databases, define and manipulate data based on structured query language (SQL). These are most popularly used and useful for handling structured data that organizes elements of data and standardizes how they relate to one another and to different properties.NoSQL DatabasesAlso known as non-relational databases, allow you to store and retrieve unstructured data using a dynamic schema. NoSQL is popularly used for its flexible ability to create a unique structure, and can be document, graph, column, or even KeyValue organized as a data structure.SQL has had a large lead over the non-relational alternatives for decades, but NoSQL is quickly closing the gap with popular databases such as MongoDB, Redis, and Cassandra.

    Source

    https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/

  • Current DBMS Market Shares

    https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/

  • Relational DBMS (2)Most popular type of DBMS∼60% of data in a DBMS is in a RDBMS (increasingly data is stored in MongoDB, a NoSQL platform based on JSON)

    https://www.mongodb.com/https://en.wikipedia.org/wiki/JSON

  • Outline

    Linking data to place

    Definitions

    Characteristics of DBMS

    Types of databases

    Relational model

    SQL

    Database designMongoDB

  • Relational DBMS (1)

    Data stored as tuples (tup-el), conceptualized as tables

    Table – data about a class of objectsTwo-dimensional list (array)Rows = objectsColumns = object states

    (properties, attributes)

  • Row = record = tuple[# rows = cardinality]

    Column = field = attribute = property[# of columns = degree]

    Table = file = relation

    FID = Primary Key= Index (FeatureID)

    Row = object

    Table = Object Class

    Table

    Foreignkey

  • Relation Rules (Codd, 1970)

    Only one value in each cell (intersection of row and column)

    All values in a column are about the same subject

    Each row is unique

    No significance in column sequence

    No significance in row sequence

  • NormalizationProcess of converting tables to conform to Codd’srelational rules

    Split tables into new tables that can be joined at query time

    The relational join

    Several levels of normalization (reduce redundancy)Forms: 1NF, 2NF, 3NF, etc.

    Normalization creates many expensive joins

    De-normalization is OK for performance optimization

  • Tax assessment database

    Joined table

    Data partially normalized into three subtables

  • Relational Join

    Fundamental query operation (ArcGIS)

    Occurs becauseNormalizationData created/maintained by different users, but integration needed for queries

    Table joins use common keys (column values -- foreign keys)

    Table (attribute) join concept has been extended to geographic case

    http://desktop.arcgis.com/en/arcmap/latest/manage-data/tables/about-joining-and-relating-tables.htm

  • Outline

    Linking data to place

    Definitions

    Characteristics of DBMS

    Types of database

    Relational model

    SQL

    Database design

  • SQL

    Structured (or Standard) Query Language –(pronounced SEQUEL)

    Developed by IBM in 1970s

    Now de facto and de jure standard for accessing relational databases

    Three types of usageStand alone queriesHigh level programmingEmbedded in other applications

  • Types of SQL Statements

    Data Definition Language (DDL)Create, alter and delete dataCREATE TABLE , CREATE INDEX

    Data Manipulation Language (DML)Retrieve and manipulate dataSELECT, UPDATE, DELETE, INSERT

    Data Control Languages (DCL)Control security of dataGRANT, CREATE USER, DROP USER

  • Outline

    Linking data to place

    Definitions

    Characteristics of DBMS

    Types of database

    Relational model

    SQL

    Database design

  • Steps involved in database creation

    Data investigation: consider the type, quantity and qualities of data to be included in the database; the nature of the entities and attributes is decided (inventory of data, needs analysis).

    Data modeling: form a conceptual model of data by examining the relations between entities and the characteristics of entities and attributes (logical design--infological model).

  • Steps involved in database creation

    Database design: creation of a practical design for the database. This step depends upon and is constrained by the software being used. Field names, specific attribute types and structures (e.g., tables) are decided (physical design--datalogical model).

    Database implementation: populating the database with attribute data. This is followed by monitoring and upkeep, fine tuning, modification and updating.

  • Database design perspectivesInfological problems deal with how to define the information to be provided by the system to satisfy the needs of its users.

    Datalogical problems are about how to design the structure and operation of the system and to take full advantage of current information technology available.

    Essentially, infological work refers to system analysis and conceptual modeling, and datalogical work to technical design and physical implementation of the system.

  • Data Model Levels

    Increasingabstraction

    Reality

    Conceptual Model

    Logical Model

    Physical Model

    Human-oriented

    Computer-oriented

    Infological

    Datalogical

    Relate back to Objects/Fields and Vector/Raster

    ERMprocess

  • ERM

    The identification of entities

    The identification of relations between entities

    The identification of attributes of entities (infologicalsteps)

    The derivation of tables from this (datalogical steps)

    Entity Relationship Modelling

    ULM

    http://edndoc.esri.com/arcobjects/9.2/NET_Server_Doc/manager/geodatabase/designing_a_geodatabase/a_note_a584253231.htm

  • ERM

    Entity Entity NameRelationship Name

    Oversight

    Relationship

    MandatoryExistence

    Optional Existence

    Attribute Name

    Regional Authority

    Water Dept

    Municipality

    EngineeringOne Many

  • ERM Relations

    Mapping an ER Model into a table.

    Example of a 1:M relation(ArcGIS example)

    Example of a M:N relation(ArcGIS relations)

    Student

    Address

    DoBStudentID

    Student Name Address Student ID Date of Birth

    Jane Doe Dunbar 456123 12/03/90

    Jim Smith Kits 876562 01/10/89

    Example of 1:1 relations

    Students Courses

    Course Instructor

    http://support.esri.com/cn/knowledgebase/techarticles/detail/37544http://desktop.arcgis.com/en/arcmap/latest/manage-data/relationships/benefits-of-relationship-classes.htm

  • Database design perspectives

    Prof. Börje Langefors recognized the importance of three contexts in the infological approach. They are the “organizational context, wherein organized collections of people/individuals are perceived; the language context, wherein organized collections of symbols and linguistic behaviors are perceived; and technical context, wherein organized collections of technical artifacts (computers, telecommunication technologies, software) are perceived” (Iivari & Lyytinen, 1998; p. 170).

    http://isworld.student.cwru.edu/tiki/tiki-index.php?page=Langefors_Review

    This quote also describes the situation with respect to a GIS within an organization.

  • Summary

    Database – an integrated set of data on a particular subject

    Databases offer many advantages over files

    Relational databases dominate, but losing their dominance

    Database design issues require careful consideration

    Newer models, such as NoSQL and “NewSQL”, are rising as cloud computing and unlimited storage capacity become commonplace

    http://en.wikipedia.org/wiki/NoSQLhttp://newsql.sourceforge.net/

  • Attribute data handling in a GIS environment.OutlineLinking data to placeGIS DataStoring attribute dataStoring attribute dataStoring attribute dataSlide Number 8OutlineDefinitionsStoring dataSimple file structuresMoving from files���to� databasesAdvantages of DBs over filesHistorical disadvantages of DBs over filesCharacteristics of DBMS (1)Characteristics of DBMS (2)Characteristics of DBMS (3)Role of DBMSOutlineTypes of DBMS ModelsHierarchical Hierarchical and NetworkRelational Tables: Topological data modelObject-oriented DBMSOverviewRelational vs non-relation DBMSCurrent DBMS Market SharesRelational DBMS (2)OutlineRelational DBMS (1)TableRelation Rules (Codd, 1970)NormalizationSlide Number 35Relational JoinOutlineSQLTypes of SQL StatementsOutlineSteps involved in database creationSteps involved in database creationDatabase design perspectivesData Model LevelsERMERMERM RelationsDatabase design perspectivesSummarySlide Number 50