2005 Ohio GIS Conference September 21-23, 2005 Marriott North Hotel Columbus, Ohio Geoprocessing for Animal Premises ID Luanne Hendricks State of Ohio OIT/GISSC Intern Columbus State Community College
Dec 27, 2015
2005 Ohio GIS Conference
September 21-23, 2005
Marriott North Hotel
Columbus, Ohio
Geoprocessing for
Animal Premises ID
Luanne HendricksState of Ohio OIT/GISSC InternColumbus State Community College
Overview
• Objective
• Source Data & Desired Outputs
• Timeline
• Tools and Automation
• Process
• Statistics
• Observations
Objective
Geoprocessing
Input:Source Data
from County Auditors
Output: - Normalized Parcel Data - Unique AG Owners
Output - Deliverables
• Normalized Parcel/Point Geodata – agricultural ( 100 <= LUC <= 199)
– dairy (LUC = 103, 113)– residential ( 510 <= LUC <= 520, LUC = 560)
• Normalized Tabular Data (Access DB)– Table of unique ag owners with owner_id– Table of parcel data with owner_id
• Time Estimate to regenerate data annually
Example: Locate Residential Parcels of Ag Land Owners
Example: Select Parcels owned by Owner ID = 2894
Owner to Parcel Table Example
Source Data – Quantity/Quality• Large volume of data
– approx. 5 million source records– some counties had 40-50 fields of data– approx. 5 GB of data
• Multiple source files per county
• Parcel, Point, CAMA data
• Non-standardized data fields
• Variable completeness
Example: Non-Normalized Source vs. Normalized Output
Processing – High Level View
Data Collection from Counties
Normalize Source Data
Generate Owner Ids for Parcel Records
Generate Owner Table
Match Dairy Addresses to Parcel Table
Create Project for User
TimelineFirst Pass
Effort Several PT HC - Approx. 1 FT HC
Tasks Data Collection & Geocoding
Normalizing Owner IDs Dairy Match
Create Project
Month January February March April May
Second Pass
Effort 1 PT HC 1 FT HC
Tasks Identify Original Source used
Manual Normalizing
Automation
Normalizing
Owner IDs
Owner Ids
Dairy match
Project
Month May June July August Sept.
Need Automation Strategy
• Need to automate process for:– Repeatability– Ease of modification– Testability– Traceability
• ...As well as speed
Tools Processing Tasks
ArcToolBox- Model Builder Script development
- Python
- VBscript
Pre-Normalization- Joining source files, - adding key id, -copying to working directory
Pre-Owner ID Generation- Address Standardization- Rejoin Data file to Shapefile
MS Access - VBA- Queries- SQL- Form Interface
- Normalization
- Owner ID & Owner Table
- (Dairy Match)
Processing Detail - Example
Pre-normalization steps in Model-Builder for a county with 2 source files – shape and CAMA that need to be joined. This county is now ready for normalization in Access. Slightly different steps are needed for point files andcounties with a single source parcel shapefile.
Processing Detail - Example Continued
Model-Builder has limitations – you can’t loop through these steps for a list ofcounties. But this model can be converted to script and coded to process alist. Additional field-name mapping steps needed due to “coarse-grained” geoprocessing object.
Loop thrucnty list.
Delete Temporarytable view & layer
Get FieldsMake Field Map
Example of Geoprocessing Tool Limitations
When you join fields in the geoprocessing environment, and create a new Feature Layer shapefile, field names are [original layer name].[field name] truncated to 10 characters. Renaming is not done automatically for you as itis when you join and create a new layer manually in ArcMap.
Python Script Example
Access Form Interface Used for Normalization
Example: Non-Normalized Source vs. Normalized Output
Normalization Mapping Table
Processing – Owner IDs
Data Collection from Counties
Normalize Source Data
Generate Owner Ids for Parcel Records
Generate Owner Table
Match Dairy Addresses to Parcel Table
Create Project for User
Owner ID and Owner Table Generation
Standardized vs. Un-standardized
Owner ID Algorithm
• Aggregate on Lastname, Firstname
• Standardize addresses
• For each Lastname,Firstname group, choose the address - OWNADD1, MAILADD1, or SITEADD, that produces the best set of matches
Statistics
ORIG_REC = Total AG + Total ResidentialNOAD = # Records with no address informationADD_REC = Total # of AG + Total Residential associated with more than 1 parcelFINL_REC = Total # of AG + Total Residential associated with at least one AG pclOWNR = # of Records in the Owner TableNMD_AG = Aggregate of OWNNAM1/MAILADD1 and OWNADD1/MAILADD1
as a sanity check and to compare how effective the processing was
Testing
• Use Statistics– Numbers make sense– Numbers add up, e.g.:
• All records in Parcel table assigned an ownerid• # Records in Owner Table = # Aggregated on Owner Id in
PCL table
• Visual Inspection– Visually inspect how Owner Ids were assigned– Create shapefile and view data in project– Spot check source vs. processed data in shapefiles
Status
• 53 counties normalized
• 40 counties have owner ids/owner table
• Dairy matching - to do
• Final project – to do
Example Project – Work in Progress
Observations and Conclusions (1)
• After initial development, Automation speeds process
• For example, using Form Interface to normalize:
Data NormalizationTime Data
Volume
Manual
1st pass
6 day 1X
Ag only
Auto
2nd pass
1 day 5X
Ag + Res
Observations and Conclusions (2)
• Automation:– speeds process after initial development investment– enables repeatability of process– makes modification and redo less painful– increases data consistency– reduces errors– accurately documents process– increases future capability to do similar processing –
tools are reusable
• Automation is cost effective
Observations and Conclusions (3)
• This job would be easier if:– Data was maintained in small standard
components:• Last Name, First Name, MI as separate fields• Address components – SiteNum, SiteDir, SiteStr• There was a standard for field names of
components