Top Banner
Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users Conference July 14, 2009
31

Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Jan 01, 2016

Download

Documents

Albert Cain
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Are Geodatabases a Suitable Long-Term Archival Format?

Jeff Essic, Matt SumnerNorth Carolina State University Libraries

2009 ESRI International Users Conference

July 14, 2009

Page 2: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

2

NC Geospatial Data Archiving Project (NCGDAP)

Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)

Focus on state and local geospatial content in North Carolina (state demonstration)

Website: http://www.lib.ncsu.edu/ncgdap

Page 3: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

3

Geospatial Data Preservation Challenge:Vector Data Formats

No widely-supported, open vector formats for geospatial data

Spatial Data Transfer Standard (SDTS) not widely supported

Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserve

Can export individual data layers for curation, but relationships and other context are lost

Page 4: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

4

Challenge: Other Data Types

Cartographic RepresentationSoftware Project Files, PDFs,

GeoPDFs, WMS images

Web 2.0 contentStreet views, Mashups

Oblique Imagery

3D Models

Page 5: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

5

Different Ways to Approach Preservation

Technical solutions: How do we preserve acquired content over the long term?

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?

Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

Page 6: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

6

Question: Frequency of Capture?

Content Exchange – Getting Data in Motion

Repository Development

Repository of Temporal Data Snapshots

Page 7: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

7

Repository Development

Downloading or acquiring “low hanging fruit”

Tapping into current data flows

Developing our own metadata when necessary

Converting and preserving vector data in shapefile format

Page 8: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

8

Data Preservation Like Fruit Desiccation?

Complex data representations can be made more preservable (yet less useful) through simplification.

Conversion of various formats to shp

Image outputs (web services,

PDF maps, map image files)

Open GeoPDF standard Analogous to paper maps

Combines data, symbology, annotation

More data intelligence than simple

image

PDF content retained in addition to,

NOT instead of data

Page 9: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

9

Archival and Long Term Access Working Group

Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus

Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.

Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.

Page 10: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

10

Archival and Long Term Access Working Group

Final Report approved by NC GICC in November, 2008

Best Practices for: Archiving Schedule

Inventory

Storage Medium

Formats

Naming

http://www.ncgicc.org/

Wake County adopted, providing archived data onlinehttp://www.wakegov.com/gis/download_data.htm

Metadata

Distribution

Periodic Review

Data Integrity

Publicity

Page 11: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

11

NDIIPP Multi-State Geospatial Project

Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC

Partners:Leading state geospatial organizations of Kentucky and Utah

State Archives of Kentucky and Utah

NCSU Libraries in catalytic/advisory role

State-to-state and geo-to-Archives collaboration

Archives as part of statewide Spatial Data Infrastructure

Page 12: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Geodatabase Curation Study: Overview

Three types of Geodatases: Personal, File, SDE

Curation/Conversion options:

Archive GDB object

Export to: XML, shapefiles, GML Simple Features (open published formats)

Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?

Page 13: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Caveats

Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested.

Some conversions required running overnight. Limited time for testing multiple datasets and scenarios.

Didn’t explore GDB’s with rasters.

Very limited geodatabase experience or expertise.

Page 14: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Personal Geodatabase

Not ideal archival object

Very proprietary – ArcGIS / MS Access formats

ESRI now recommends using File GDB insteadhttp://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases

Archive export formats: XML, shapefiles

Page 15: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

File Geodatabase

Potential archival object

Kentucky KYGEONETESRI working on low-level (non ArcObjects based) API (http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ and http://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E2713-1422-2418-

7F20BB7C186B5B83&questionId=2578)

Folder/File structure

Can see “under the hood”

Requires knowledge of component parts

Archive export formats: XML, shapefiles, GML

Page 16: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

File Geodatabase

KYGEONET:

“Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.”

http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf

Page 17: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

SDE Geodatabase

Stored in RDBMS, so can’t be archived as a stand-alone object unless exported

Supports Historical Archiving

Commonly used among local govts. for enterprise data management

Archive export format: XML, fGDB, shapefiles

Page 18: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Questions for Testing

Will pGDB XML export files round-trip between 9.1 and 9.3.1?

Will fGDB XML export files round-trip between 9.2 and 9.3.1?

Will fGDB GML round-trip within 9.3.1?

Do GDB’s have added value that is not represented in shapefile exports?

Page 19: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Personal and File GDB Export

Export to XMLExport to shapefiles

Export to XML interface

Page 20: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Personal GDB TestsRichmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Export to XMLusing 9.1 / Binary

Success 2.8 GB (4X > than source) 269 MB 1:10.7

XML Import to pGDB using 9.1

Success 736 MBAttribute text for Sub-Domains and Relationships Preserved

XML Import to pGDBusing 9.2

FAILED(size reached 394 MB)

XML Import to pGDBusing 9.3.1

FAILED(size reached 788 MB)

pGDB Export to Shapefilesusing 9.3.1

Success 523 MB / 448 FilesAttribute text for Sub-Domains and Relationship Classes are lost; Codes and IDs retained

Page 21: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

pGDB Import of 9.1 XML

9.3.1 Failure Message

9.2 Failure Message

Import in progress

Page 22: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

pGDB Export to Shapefiles

Sub-domain attribute text is lost in the conversion to shapefile

Page 23: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

pGDB Upgrade to 9.3.1

Richmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Upgraded to 9.3.1 pGDB

Success 728 MBNote: Upgrade using “Properties/Upgrade Geodatabase”

Export to XML Success 1.25 GB

XML Import to pGDBusing 9.3.1

Success 738 MBFunctionality and content intact

Page 24: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

pGDB conversion to fGDB

Richmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Import to 9.3.1 fGDB Success 274 MB / 322 Filessub-domain attributes preserved; relationship classes were lost

Page 25: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

File GDB Tests

Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009

Initial Size Compressed Size Ratio

Original fGDB 224 MB / 64 files 80.9 MB 1:2.77

Export to XMLusing 9.2 / Binary

Success 1.11 GB (5X > than source) 137 MB 1:8.3

XML Import to fGDBusing 9.3.1

Success 223 MB / 61 Files

fGDB Export to shapefilesusing 9.3.1

Success 427 MB / 63 FilesNo sub-domain attributes or relationship classes to test, but it’s documented that significant fGDB functionality and tabular data may be lost.

Page 26: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

GML Export

GML “Simple Features Profile” now supported by 9.3

ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users

Page 27: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

File GDB/GML Test

Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009

Initial Size Compressed Size Ratio

Original fGDB 224 MB / 64 Files 80.9 MB 1:2.77

Export to GMLusing 9.3.1 456 MB 60.1 MB 1:7.59

GML Import to fGDBusing 9.3.1

FAILED(reached 111 MB / 46 Files)

Page 28: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.
Page 29: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Conclusions

For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB.

Stand alone fGDB may be safe archival format, following KYGEONET’s lead.

Risk: format newness & unknown future

Will feel safer after ESRI release of API.

Page 30: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

Future Study Needs

Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved?

SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles?

What’s the problem with the GML import?

Page 31: Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.

31

http://www.lib.ncsu.edu/ncgdap/presentations.html

Jeff Essic, Matt SumnerData Services Librarians

NCSU [email protected], [email protected]

Slide Presentation