Top Banner
Developing a Tutorial for Grouping Analysis in ArcGIS Daniel Pierre May 29, 2014
47

Developing a Tutorial for Grouping Analysis in ArcGIS

Aug 11, 2014

Download

Data & Analytics

This presentation describes tools and possible workflows using the Grouping Analysis tool in ArcGIS. The tutorial developed from this material highlights practical usage of Grouping Analysis with additional tools to solve real-world problems in two scenarios and is suitable for ArcGIS users at any level of experience. The tutorial was produced as a Major Research Project in GIS for Business at the Centre of Geographic Sciences, sponsored by Esri.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Developing a Tutorial for Grouping Analysis in ArcGIS

Developing a Tutorial for Grouping Analysis in ArcGIS

Daniel PierreMay 29, 2014

Page 2: Developing a Tutorial for Grouping Analysis in ArcGIS

1. Introduction

2. Data

3. Grouping Analysis Workflows

4. Tutorial Exercises

5. Conclusions: Recommendations

Presentation Outline

Page 3: Developing a Tutorial for Grouping Analysis in ArcGIS

Lauren Rosenshein Bennett, MSGeoprocessing Product Engineer, [email protected]

Dr. Konrad DramowiczFaculty, Centre of Geographic

[email protected]

Dr. Ela DramowiczFaculty, Centre of Geographic

[email protected]

Introduction

Project Sponsor & Supervisors

Page 4: Developing a Tutorial for Grouping Analysis in ArcGIS

Introduction

• Experimental testing of tool with multiple datasets

• Incorporation of Grouping Analysis with other tools

• Review of technical literature on clustering algorithms

• Review of existing tutorials

Project Overview

Page 5: Developing a Tutorial for Grouping Analysis in ArcGIS

Introduction

• Introduced at ArcGIS 10.1

• Available with Basic, Standard and Advanced license levels

• Found in the Spatial Statistics toolbox, within the Mapping Clusters toolset

• Script tool

Grouping Analysis Tool

Page 6: Developing a Tutorial for Grouping Analysis in ArcGIS

Introduction

• “...Performs a classification procedure that tries to find natural clusters in your data.” - Esri

• An aid for data comprehension• Feature similarity is based on

attributes specified as analysis fields and optionally, spatial constraints

• Given a number of groups, features within each output group are as similar as possible while groups are as different as possible

Grouping Analysis Tool

Page 7: Developing a Tutorial for Grouping Analysis in ArcGIS

Introduction

• Two algorithm types: cluster analysis (traditional K-means) and regionalization (spatial K-means)

• Thirteen parameters (six required)

• Grouping results contingent on the number of groups, analysis fields, and type of spatial constraint

Grouping Analysis Tool

Page 8: Developing a Tutorial for Grouping Analysis in ArcGIS

Data

Features:• Esri• City of Vancouver

Multivariate Data:• World Bank• BBC• Weatherbase• Statistics Canada

Data Sources

Page 9: Developing a Tutorial for Grouping Analysis in ArcGIS

Data

• Data Enrichment (ArcGIS Online)

• HTML table import

• Spreadsheet reformatting

• Table joins

• Feature class edits

Data Preparation

Page 10: Developing a Tutorial for Grouping Analysis in ArcGIS

Data

Selection Criteria:

• Two scales of analysis

• Illustration of various spatial constraint effects on results

• Sufficient number of features

• Visible spatial patterns in results

Tutorial Datasets

Page 11: Developing a Tutorial for Grouping Analysis in ArcGIS

General Steps:

• Exploratory data analysis

• Preprocessing

• Determining appropriate Grouping Analysis settings

• Postprocessing, interpretation and evaluation of results

Grouping Analysis Workflows

Page 12: Developing a Tutorial for Grouping Analysis in ArcGIS

Exploratory Data Analysis

1. Distribution of variable values• Thematic mapping• Spatial autocorrelation

2. Spatial relationships among features

• Contiguity of features and number of neighbours

• Spatial autocorrelation

Exploratory Data Analysis

Page 13: Developing a Tutorial for Grouping Analysis in ArcGIS

Exploratory Data Analysis

• Explore distribution of dataset variables

• Choropleth maps and graduated symbol maps

• Identify set of variables to be used for Grouping Analysis

Thematic Mapping

Page 14: Developing a Tutorial for Grouping Analysis in ArcGIS

Exploratory Data Analysis

• Analyze contiguity relationships among features

• Polygon Neighbors tool

• Determine relative connectivity of features by counting number of neighbours

• Frequency tool

Spatial Relationships

Page 15: Developing a Tutorial for Grouping Analysis in ArcGIS

Exploratory Data Analysis

• Analyze contiguity and/or proximity relationships among features using GeoDa

• Create spatial weights

• Display histogram of feature connectivity according to defined spatial relationships

• Histogram linked to map and attribute table

Alternative Approach

Page 16: Developing a Tutorial for Grouping Analysis in ArcGIS

Exploratory Data Analysis

• Considers attribute values and location of features simultaneously

• Moran’s I statistic determines whether spatial pattern of values is dispersed, random or clustered

• Significance of pattern evaluated with corresponding z-score

• One variable at a time

Spatial Autocorrelation

Page 17: Developing a Tutorial for Grouping Analysis in ArcGIS

Preprocessing

Use hot spots to limit study area for Grouping Analysis:

• Calculate incremental spatial autocorrelation

• Identify distance band of most intense clustering

• Create hot spot map• Select features from original

dataset based on location of hot spots

Preprocessing

Page 18: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

1. How many groups should be created?

2. Which analysis fields should be used?

3. Is a spatial constraint necessary? If so, which type is appropriate?

Grouping Analysis Settings:Key Considerations

Page 19: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

• Default number is 2

• Sturge’s rule:

C = 1 + 3.3 log(n), whereC is the number of groups and n is the number of features

• Evaluate the optimal number of groups (up to a maximum of 15)

Number of Groups

Page 20: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

Two vs. Three Groups

Page 21: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

• Generally driven by research purpose and objectives of grouping

• Guide selection of analysis fields with exploratory data analysis findings

• Spatial variables may be used as indirect spatial constraints

• Assess effectiveness of fields to distinguish features with output report

Analysis Fields

Page 22: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

Temperature: Spatial Variable

Page 23: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

• Choice of spatial constraint or no spatial constraint determines which algorithm is used for grouping

• No spatial constraint – traditional K-Means (data space only)

• Any spatial constraint – Spatial ‘K’luster Analysis by Tree Edge Removal (SKATER) method (spatial K-Means)

Spatial Constraints

Page 24: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

No Spatial Constraint vs.Spatial Constraint

Page 25: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

• Contiguity – edges only (“rook” type) or edges and corners (“queen” type)

• Delaunay triangulation – contiguity of representations of features as Voronoi polygons

• Proximity – K nearest neighbours

• Spatial weights

Spatial Constraint Types

Page 26: Developing a Tutorial for Grouping Analysis in ArcGIS

Grouping Analysis Settings

• Evaluate optimal number of groups

• Guide selection of analysis fields with calculated R2 values

• Visually assess results of specified spatial constraint

Iterative Process for Optimizing Grouping Analysis

Page 27: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Spatial distribution of groups (map)

• Global statistics (output report)

• Group and variable statistics (output report)

• Group profiles

Interpretation of Results

Page 28: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Compare group means with each other and global range

Group Profiles

Page 29: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Compare group means and ranges for each variable

Group Profiles (2)

Page 30: Developing a Tutorial for Grouping Analysis in ArcGIS

• Consider global mean, median and range for each variable

Group Profiles (3)

Interpretation & Evaluation

Page 31: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Global Moran’s I statistic

• Determine spatial pattern of group membership

• Measure spatial compactness of group membership

• Clustered groups generally desired

Evaluation of Results: Spatial Autocorrelation

Dispersed

Clustered

Random

Page 32: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Smallest to largest group

• Indicator of balance in group membership

• Balanced number of group members generally desired for comparison of statistics

• Frequency tool

Evaluation of Results: Cluster Size Ratio

Page 33: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Goodness measure that combines concepts of cohesion and separation

• Adapted from cluster analysis to consider attribute data and location

• Silhouette coefficient is calculated for every feature and the average is taken for the entire dataset

Evaluation of Results: Silhouette

Page 34: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

(B – A) / max(A, B) where

A is the distance between a feature and its group center

B is the distance between the feature and its neighbouring group center

Silhouette Coefficient

Page 35: Developing a Tutorial for Grouping Analysis in ArcGIS

Interpretation & Evaluation

• Range between –1 (poor) and 1 (excellent)

• < 0.2 indicates poor clustering

• > 0.5 indicates good partition of the data

Silhouette Coefficient Values

Page 36: Developing a Tutorial for Grouping Analysis in ArcGIS

Tutorial Exercises

• Six exercises

• Two scenarios (3 exercises for each)

• Suitable for users at all levels of experience

• Exercises take the user through the steps of preprocessing, group creation, interpretation and evaluation of results outlined here

Grouping Analysis Tutorial

Page 37: Developing a Tutorial for Grouping Analysis in ArcGIS

Tutorial Exercises

Exercises:

1. Data exploration

2. Grouping for exploratory data analysis

3. Using Spatial Statistics tools to target areas of interest

Scenario 1: Analysis of Crime in Chicago

Page 38: Developing a Tutorial for Grouping Analysis in ArcGIS

Tutorial Exercises

Exercises:

4. Create groups and use results to write profiles

5. Explore effects of spatial constraints

6. Evaluation of results

Scenario 2: Analysis of Olympic Results

Page 39: Developing a Tutorial for Grouping Analysis in ArcGIS

Tutorial Exercises

1. All tutorial exercises use polygon data exclusively; point features not covered

2. Space-time constraints using spatial weights matrix file not covered

3. Catered to general user; no exercises specifically target advanced users

Limitations

Page 40: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

1. Exploratory data analysis

2. Grouping Analysis

3. Evaluation of results

Recommendations: Enhancements and Additional Tools

Page 41: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

• Multi-step process using Polygon Neighbors, Frequency and table joins could be simplified

• Dynamic linking of objects can make use of existing ArcGIS functionality

Determining Spatial Relationships Among Features

Page 42: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

• Expand types of spatial relationships that can be analyzed

• Enable the analysis of higher order relationships

Determining Spatial Relationships Among Features (continued)

Page 43: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

• Tools for determining most useful diagnostic or predictor variables

• Guide selection of analysis fields for data partitioning

• Adapt neural networks or other data mining tools to work with spatial constraints

Identification of Useful Diagnostic Variables

Page 44: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

Grouping Analysis Tool Enhancements

• Create unique identifier

• Replace null values

Page 45: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

• Spatial weights matrix can be used as the spatial constraint for creating groups

• Custom weights require either manual table creation or programming

• Solution: interactive feature selection

User-defined spatial relationships among features

Page 46: Developing a Tutorial for Grouping Analysis in ArcGIS

Recommendations

• Expand beyond R2 and F-statistic values in output report

• Adapt methods used to evaluate cluster analysis algorithms (e.g. Silhouette)

• Challenge: universally applicable evaluation methods may not be feasible

Evaluation of Results

Page 47: Developing a Tutorial for Grouping Analysis in ArcGIS

THANK YOU!