Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Digital Preservation in State Government: Best Practices Exchange 2006
Jan 19, 2018
Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP)
NCSU Libraries
Steve Morris Head of Digital Library Initiatives
Digital Preservation in State Government: Best Practices Exchange 2006
Note: Percentages based on the actual number of respondents to each question 2
Overview
Digital geospatial data preservation issuesTechnical solutionsOrganizational/cultural solutions
Note: Percentages based on the actual number of respondents to each question 3
NC Geospatial Data Archiving Project
Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)One of 8 initial NDIIPP partnerships (only state project)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation
Note: Percentages based on the actual number of respondents to each question 4
Targeted Content
Resource TypesGIS “vector” dataDigital orthophotography Digital mapsTabular data
Content ProducersMostly state, local, regionalSome university, commercialSelected local federal projects
Note: Percentages based on the actual number of respondents to each question 5
Today’s geospatial data as tomorrow’s cultural heritage
Future uses of data are difficult to anticipate (as with Sanborn Maps).
Note: Percentages based on the actual number of respondents to each question 6
Risks to Digital Geospatial DataProducer focus on current data
Time-versioned content generally not archivesFuture support of data formats in question
Vast range of data formats in use--complexShift to “streaming data” for access
Archives have been a by-product of providing accessPreservation metadata requirements
Descriptive, administrative, technical, DRMGeodatabases
Complex functionality
Note: Percentages based on the actual number of respondents to each question 7
Different Ways to Approach Preservation
Technical solutions: How do we preserve acquired content over the long term?
Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—at point of production?
Note: Percentages based on the actual number of respondents to each question 8
Vector Data Format OptionsOption A: use an open format and have a really unfortunate transformation and limited vendor support for the output objectOption B: use closed format but retain the original content and count on short- and medium-term vendor support. Option C: do both to buy time and look for an open, ASCII solution. (watch GML activity)
No sweet spot, just an evolving and changing mix offlawed options that are used in combination.
Note: Percentages based on the actual number of respondents to each question 9
Preserving Cartographic Representation
Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
Note: Percentages based on the actual number of respondents to each question 10
Preserving GeodatabasesSpatial databases in general vs. ESRI Geodatabase “format”Not just data layers and attributes—also topology, annotation, relationships, behaviorsGrowing use of geodatabases by municipal, county agenciesSome looking to Geodatabase as archive platform (in addition to feature class export)ESRI Geodatabase archiving approaches
Feature Class Export, XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication
Note: Percentages based on the actual number of respondents to each question 11
Harnessing Geospatial Web Services
Image atlases from WMS services?Capturing cartographic representation?Recording records from decisions-making processes?Later: data transfer via WFS & GML?, Other?
Note: Percentages based on the actual number of respondents to each question 12
Interest in how geospatial content interacts with widely available digital repository softwareFocus on salient, domain-specific issuesChallenge: remain repository agnostic
Avoid “imprinting” on repository software environmentPreservation package should not be the same as the ingest object of the first environmentTension between exploiting repository software features vs. becoming software dependent
Project Repository Approach
Note: Percentages based on the actual number of respondents to each question 13
Organizational/Cultural Approaches
Take the dataas is, in the manner in whichit can be obtained
Provide feedback to producer organizations/inform state geospatial infrastructure
“Wrangle”and archivedata
Note the ‘Project’ in ‘North Carolina Geospatial Data ArchivingProject’– the process, the learning experience, and the engagementwith industry and infrastructure are more important than the archive
Note: Percentages based on the actual number of respondents to each question 14
Framework data communitiesSnapshot frequency, naming schemes, classification, GML application schemas, format strategies
Metadata standards and outreachPersistent identifiers, versioning, feedback on metadata quality
Content replication/transferFor data improvement projects, disaster preparedness, aggregation by regional service providers, … and archives
Where does archiving and preservation fit in?
Points of Engagement with Spatial Data Infrastructure
Note: Percentages based on the actual number of respondents to each question 15
Geography Markup Language (GML) for archiving (PDF/A version of GML?)GeoDRM
Adding preservation use casesContent Packaging
Will there be an industry solution?Web Map Context Documents
Can we save data state as well as application state?Content Replication
Is this a layer in the overall architecture?Persistent Identifiers
Points of Engagement with the Open Geospatial Consortium (OGC)
Note: Percentages based on the actual number of respondents to each question 16
Software vendorsBetter support for temporal data managementTools for retrospective data conversion
Web mashup and open source communitiesWMS caching schemesStandard tiling schemes with temporal component?
Data vendorsCultivate market for older data (scaled pricing?)Tech transfer on archiving practices?
Points of Engagement with Industry
Note: Percentages based on the actual number of respondents to each question 17
Project StatusCultivating a market
for older data.
Note: Percentages based on the actual number of respondents to each question 18
Project StatusCultivating tools for
retrospective conversion.
Note: Percentages based on the actual number of respondents to each question 19
Geospatial data is complex, introducing manifold challenges to ingest processes and repository developmentVector data and spatial databases are especially complexGeospatial data exists in very large quantities and is subject to frequent updateNeed to engage industry in the solutionNeed to engage point of production
Conclusion
Note: Percentages based on the actual number of respondents to each question 20
Questions?
Contact:
Steve MorrisHead, Digital Library InitiativesNCSU [email protected]
Web site: http://www.lib.ncsu.edu/ncgdap/