Data Science Made Easy in ArcGIS Using Python and R James Jones, Lu Zhang and Lain Graham [email protected] ; [email protected] and [email protected]
Data Science Made Easy in ArcGIS Using Python and R
James Jones, Lu Zhang and Lain [email protected]; [email protected] and [email protected]
Workshop Agenda
• Introduction• Python • R• Conclusion / Q&A
Data Science
• Core analytics in ArcGIS- Maximize performance and utility- E.g. Spatial Statistics, Geostatistics, Spatial Analyst- E.g. GeoAnalytics, Insights, ArcGIS Python SDK
• The interoperability of the ArcGIS platform makes workflows more efficient - Techniques and methodologies continue to develop- Data availability continues to increase
• The data science community is vast and evolving- ArcGIS extends directly via scripting APIs
- E.g. Python, R, Java
From Core to Community
Data Science CommunityPython
Data Science Community
• Well over 12,000 packages to enhance core• Most widely used statistical software in the world• Diverse and powerful
- Universities, Government, Industry- Finance, Ecology, Statistics- Machine learning, predictive analytics
R
Battle of BandsWhich one is best?
General Programming Language Functionality Tailored towards statistics and data Analysis
Yes (PyPI and Conda) Package Management Yes (CRAN)
Individual machines to distributed computing environments Scalability Standalone computer or individual
servers
Large number of libraries for graphical display of data Display of Data Numerous libraries for making
incredible graphics
General Programming, Data Science, Web Development Use Cases Lingua franca of data science
YES! Integrates with ArcGIS YES!
What are you most comfortable with?What is the best tool for the job?
Data Science Process
Ask a question
Get the data
Explore the data
Model the data
CommunicateDownloadScrape
ETLEnrich
ChartSummary & Descriptive
StatisticsDevelop Hypotheses
Identify Patterns
RegressionMachine Learning
“Big Data”
Jupyter NotebooksWritten/Textual Reports
Web AppsStory MapsPowerPoint
Getting the Data
Non-GIS Data• Large number of libraries to download,
transform, condition, and prepare data• Generic web libraries (Requests, urllib)• API Specific libraries (Tweepy)• Ability to scrape and parse existing web
sites (Scrapy and BeautifulSoup)
GIS Data• ArcPy allows native access to all Esri
Data formats and the full functionality of ArcGIS Desktop
• ArcGIS API for Python allows for access and interaction with content within your WebGIS
Exploring Data
• Jupyter Notebooks are your friend• ArcGIS API for Python was designed to work
natively with the Jupyter Notebook• Ability to create Spatially Enabled DataFrame• SEDF requires ArcGIS API for Python version 1.5• Core Pandas DataFrame, allows for rapid
exploration of data• Can create maps, charts and a variety of other
objects quickly using a common syntaz
Modeling the Data
• Many native tools to enable modeling of data (Space-Time Pattern Mining, Density Based Clustering, etc…)
• Integration of popular third-party Machine Learning/Deep Learning libraries
- Scikit-Learn- Tensorflow- PyTorch- NLTK
Apps
Desktop APIs
Machine Learning Tools in ArcGIS
• Maximum Likelihood Classification
• Random Trees• Support Vector Machine
• Empirical Bayesian Kriging• Areal Interpolation• EBK Regression Prediction• Ordinary Least Squares
Regression and Exploratory Regression
• Geographically Weighted Regression
Classification Prediction
Clustering• Spatially Constrained
Multivariate Clustering• Multivariate Clustering• Density-based Clustering• Image Segmentation• Hot Spot Analysis• Cluster and Outlier Analysis• Space Time Pattern Mining
Jupyter Notebooks and data integration
Python Demo
An IntroductionThe R-ArcGIS Bridge
R?
• A widely used statistical programming language • More than 10,000 Packages• Powerful and Open-Sourced
- Universities, Government, Industry- Finance, Ecology, Statistics- Machine learning, predictive analytics
ArcGIS? • A GIS platform• Open Standards / OGC Compliant
- Universities, Government, Industry- Finance, Ecology, Statistics
• The R-ArcGIS bridge provides the ability to integrate R and ArcGIS functionality.
The R-ArcGIS Bridge
ArcGIS R script R
ArcGIS R
ArcGIS R
GIS Analyst
Data Scientist
Developers
Who Can Use the R-ArcGIS Bridge?
ArcGIS Pro
R
1.1 (or later)
3.2.2 (or later)
ArcMap
10.3.1 (or later)
Recommended !
RStudio
Version Requirements for the R-ArcGIS Bridge
Installing the Bridge / Getting Started
ArcGIS Pro Project Tab RStudio
• arc.open()• Reads GIS data that is currently stored in a shapefile, geodatabase, layer, or table into a R
environment (containing spatial information and attribute information)
• arc.select()• Obtain your data set in an R data frame object. Offers the ability to work with specific attributes
from your data (dplyr / tidyr) and construct SQL queries
• arc.write()• Allows you to easily save your data to shapefile, geodatabase, or table to an ArcGIS environment
Top 3 Most Useful R-ArcGIS Bridge Functions:
Vector Support
• Ability to read and write vector data
• Support for key R objects and spatial packages- R data frame object- Compatibility with sp- Compatibility with sf
• Customize data manipulations- Craft SQL queries / Subset by specific columns- Reproject data as needed
• Maintain spatial geometries when working with dplyr
Points
Lines
Polygons
Raster Support
• Ability to read and write raster data- Handle big data raster data with the ability to read in chunks by bands- Compatibility with CRF format and Mosaic Datasets
• Customize selections and subsets- Create subsets by bands or pixel rows and columns- Resample options available- Select desired pixel format for specific analyses
How is Microsoft R useful?
Raster data can become a big data problem, quickly
Mosaics
Time
Mosaics: Data structure to store/process rasters in space and time
Time-Series Rasters/Mosaic
Microsoft R
Working with Big Data?
• Microsoft Open R is a publicly available R-version for big data
• Contains almost all CRAN libraries
• Window-based operations and image operators speed up drastically
• Set-up and Usage from Pro is exactly the same as traditional R
Predict Home Vacancy Rates in Washington DC based on Vacant Home Locations in Baltimore (XY Coordinates)
R-ArcGIS Bridge Demo
Ask a question
Get the data
Explore the data
Model the data
Communicate
ArcGIS
ArcGIS / RStudio
RStudio
ArcGIS
ResourcesLearn More on Using the R-ArcGIS Bridge
Resources from UC 2018:- https://github.com/R-ArcGIS
Getting Started:- Analyzing Crime Using Statistics and the R-ArcGIS Bridge Learn Lesson- Using the R-ArcGIS Bridge Introductory Web Course
Creating R Script Tools:- Integrating R Scripts into Geoprocessing Tools Web Course- arcgisbinding Package Vignette
Powerful, In-depth Workflows in ArcGIS and R- Identify an Ecological Niche for African Buffalo
Upcoming Sessions at Developers Summit (1/31/2019)
• ArcGIS API for Python: Introduction to Scripting your Web GIS- Thursday, January 31st @ 10:30 am
• ArcGIS Pro: Scripting with Python- Thursday, January 31st @ 10:30 am
• ArcGIS API for Python: Advanced Scripting- Thursday, January 31st @ 1:30 pm
• Expand your Analysis with ArcGIS GeoAnalytics Server- Thursday, January 31st @ 2:30 pm
• Maps & Charts: Making Visual Interfaces Accessible- Thursday, January 31st @ 3:30 pm
Print Your Certificate of AttendancePrint Stations Located at L Street Bridge
Wednesday
6:30 pm – 9:00 pm Networking Reception
National Building Museum
Please Take Our Survey on the AppDownload the Esri Events app and find your event
Select the session you attended
Scroll down to find the feedback section
Complete answersand select “Submit”