NOAA Data Management Activities Deirdre Jones, EDMC Chair Jeff de La Beaujardière, DM Architect Prepared for DAARWG 2011-11-15 1
NOAA Data Management Activities
Deirdre Jones, EDMC ChairJeff de La Beaujardière, DM Architect
Prepared for DAARWG 2011-11-15
1
Outline
• Motivation• Recent EDMC Accomplishments• EDMC FY2012 Plans• DM Framework in NEO Strategy• Data catalog approaches
2
Motivation
• NOAA Strategic Plan calls for:– Improved data interoperability and usability
through application and use of common data management standards
– Enhanced access and use of environmental data through data storage and access solutions, integration of systems, and long-term stewardship
– Increased volume and diversity of data and information effectively integrated into models
3
New EDMC Procedural Directives• Data Management Planning• Directs managers of all projects and systems
that produce data to write DM Plans• Data Documentation–Directs NOAA programs to provide data
documentation (metadata)• Data Sharing by NOAA Grantees–Directs NOAA grantees to make their data
publicly availableAll 3 are agenda topics for tomorrow 4
EDMC Plans for FY2012 (1/2)• Implement approved procedural directives– EDMC developing detailed work plan– Further discussion tomorrow
• Begin to develop additional Procedural Directives– Data Access and Discovery• Goal: Enable users to find and retrieve NOAA data• Goal: Automate publication of NOAA data to data.gov and
GEOSS
– Data Citation• Goal: Enable datasets to be referenced by unique identifier
to provide credit, enable usage metrics, and distinguish duplicates
5
EDMC Plans for FY2012 (2/2)• Hold 3rd annual NOAA wide EDM Conference‐– To engage stakeholders
• Host OGC Workshop– Coordination on data access standards
• Support DAARWG Meetings (twice annually)– To receive guidance from advisory board
• Support development of Archive Concept of Operations– Called for in CLASS External Review– Briefing after lunch today
6
Data Management Frameworkfrom
National Earth Observations (NEO) Strategy, ch. 4 (inter-agency draft)
Jeff de La Beaujardière, PhDNOAA DM Architect
7
Data Management Framework
Principles
Governance
Architecture
Standards
Assessment
Dat
a Li
fecy
cle
Principles• Full and Open Access• Preservation• Information Quality• Ease of Use
8
Dat
a Li
fecy
cle
Dat
a Li
fecy
cle
Dat
a Li
fecy
cle
Dat
a Li
fecy
cle
from National Earth Observations (NEO) Strategy- Data Management Chapter (in preparation 2011)
Data Lifecycle
Planning and ProductionActivities
Data ManagementActivities
UsageActivities
9
from National Earth Observations (NEO) Strategy- Data Management Chapter (in preparation 2011)
Dat
a L
ifec
ycle
UsageActivities
DataManagementActivities
Planning andProductionActivities
CollectionProcessing
Quality ControlDocumentation
CatalogingDisseminationPreservationStewardship
Usage TrackingFinal Disposition
DiscoveryReceptionAnalysis
Product GenerationUser Feedback
CitationTagging
Gap Assessment
Requirements DefinitionPlanning
DevelopmentDeploymentOperations
10
from NEO Strategy - DM Chapter(in prep. 2011)
Dat
a L
ifec
ycle
UsageActivities
DataManagementActivities
Planning andProductionActivities
CollectionProcessing
Quality ControlDocumentation
CatalogingDisseminationPreservationStewardship
Usage TrackingFinal Disposition
DiscoveryReceptionAnalysis
Product GenerationUser Feedback
CitationTagging
Gap Assessment
Requirements DefinitionPlanning
DevelopmentDeploymentOperations
11
Data Documentation
DM Planning
Data Sharing
What-to-Archive
Applicability ofEDMC Directives
Cataloging
Data Citation
Data Services
Dat
a L
ifec
ycle
UsageActivities
DataManagementActivities
Planning andProductionActivities
CollectionProcessing
Quality ControlDocumentation
CatalogingDisseminationPreservationStewardship
Usage TrackingFinal Disposition
DiscoveryReceptionAnalysis
Product GenerationUser Feedback
CitationTagging
Gap Assessment
Requirements DefinitionPlanning
DevelopmentDeploymentOperations
12
Some of the possible feedback
loops in the Data
Lifecycle
(proposed)NOAA Data Catalog Approach
Jeff de La Beaujardière, PhDNOAA DM Architect
13
Catalog Goals• Users can find NOAA data for desired
phenomenon, location and time– Without knowing Office/Program structure– Single starting to point to find the data that is
accessible via web services and well documented• Data providers can register their services
once, in a community catalog– And have their data be visible in a master catalog
• NOAA leadership can see improvements in NOAA data discovery & access
14
Some Existing Community-Specific Catalogs
15
IOOSCatalog
Data
UAFCatalog
Services
NGDCGeoportal
NODCGeoportal
CWIC CLASS Catalog
GeoPlatform (ArcGIS.com Portal)
NCDCGeoportal
Conceptual NOAA Distributed Catalog Architecture
Data
NOAA Master Catalog
NOAA WebSite
UI
Community Catalogs
data.gov
API
GEOSS
API
federated search(or scheduled harvest)
NCDC NODCIOOSUAF
Users & Clients
16
AnalysisTools
API
Services
NGDC others...
others...
(possiblycolocated)Archive
ConOps
Data Management Overview Graphic:Connections and Information Flow
17
DMPlan* Data
Documentation*(Metadata)
ArchiveDecision*
Data AccessService
OAISReference
Model
DataInventory
MetricsDashboard
CatalogService
ID
Tools
Result• paper• decision• policy• responseID
createwrite
assess
preserve
guide
add
publish*
understand
get find
register
compile measure
analyze
useDataProducer
DataUser
cite
*topic of current EDMC Directive
publish
Archive
[OV-2](Note: Not all
activities illustrated)Requirements Gap
Assessment assessguide
guide
NOAA Leadership
assess
BACKUP SLIDES
18
DM Principles from NEO StrategyPrinciples• Full and Open Access:
Earth observations should be made fully and openly available to all users promptly, in a non-discriminatory manner, and free of charge.
• Preservation:Earth observations should be managed as an asset and preserved for future use.
• Information Quality:Earth observations should be of known quality and fully documented.
• Ease of Use:Earth observations should be easily discoverable and accessible online using interoperable services and standardized formats that encourage the broadest possible use.
19
from National Earth Observations (NEO) Strategy- Data Management Chapter (in preparation 2011)
Procedural Directive Data Management Planning (DMP)
• Summary– Directs managers of all projects and systems that produce data to write DM Plans
• Provides guidance on content of DM Plans, including:– General description of the data– Data documentation and standards– Data access methods– Initial data storage and long-term preservation– Provides a DMP template and FAQs
• Feedback– Hundreds of comments through briefings, workshops, and meetings shaped
principles, concepts and final text.– 117 comments received during official 30-day comment period
• EDMC approval was unanimous
2020
Procedural Directive Data Documentation
• Summary:– Directs NOAA programs to provide data documentation (metadata)– Requires use of ISO 19115/19139
• Provides guidance on metadata content, including:– Metadata for Discovery– Metadata for Use– Metadata and Documentation for Understanding– Documentation of Collections– Documentation of Datasets– Documentation of Services
•Highlights metadata resources, tools and challenges• EDMC approval was unanimous
2121
Procedural Directive:Data Sharing by NOAA Grantees
• Summary– Directs NOAA grantees to make their environmental data publicly available– Requires data sharing plan to be provided with new proposals and published
at award– Data must be shared in a "timely" fashion but no later than two years after
collection– Exceptions or extensions granted for legal reasons or on a case-by-case basis
upon request• Provides guidance on data sharing plans
– Includes metadata – FAQs and template
• Feedback– EDMC approval – Feedback from Cooperative Institutes and Sea Grant Program
22
Good Data Management supportsNOAA Leadership Priorities
23
NOAA Data
Good Documentation
DataInventory
MetricsDashboard
Data Catalog• ______• ______• ______
StandardizedServices+
enable
Ability to find, access,
understandNOAA data
Visibilityin data.govand GEOSS
enables
selectedNOAALeadershipPrioritiesforNOAA data
+
NOAA Master Catalog
metadatarecord
Tag Database metadatarecord
metadatarecord metadata
record
metadatarecord
metadatarecord
metadatarecord
metadatarecord
DWH
data.gov
GEOSS CORE
GEOSS StP
Purpose E
Purpose F
DWH Response data.gov GEOSS
Data CORE
ExternalCatalogs
orPortals
otherportal
Tags are not inserted into
metadata records by data providers.
Instead, the Catalog adds tags to
indicate datasets relevant to a
particular purpose.
Datasets with a relevant tag are
recorded by external catalogs.
Tagging Concept
24
Potential Relationship of GeoPlatform to NOAA Master Catalog
B) GeoPlatform is Master Catalog
CommunityCatalogs
Cat. 1 Cat. 2
GeoPlatformMap & Data Svcs
Cat. N
D) Master Catalog feeds GeoPlatform
CommunityCatalogs
Cat. 1
Cat. 2
MasterCatalog
Map & Data Svcs
Cat. N
GeoPlatformMap Svcs Only
C) GeoPlatform feeds Master Catalog
Cat. 1
Cat. 2
Master CatalogMap & Data Svcs
GeoPlatformMap Svcs
Only
CommunityCatalogs
25
A) No relation
MasterCatalog
Cat. 1
Cat. 2
GeoPlatformMap Svcs Only
WMS 1
WMS 2
Community CatalogsMap Services
GeoPlatform and Master Catalog working together
NOAA Master Catalog(Geoportal or t.b.d.)
Web-basedMap Viewer
UI
data
service
Catalog1
Cat.2
Catalog3
26
GeoPlatform(ArcGIS.com Portal)
data.gov
CS/W
GEOSS
Other API
othercatalog
GCMD
WAFList of WMS
List of manualregistrations
ArcGISserver
Shapefile
KML
Man
ual
regi
stra
tion
Some datasets might be registered
directly in GeoPlatform
griddeddata
griddeddata
griddeddata
UAF Distributed Catalog Architecture
Project Data &
Services
Unified Access Framework (UAF) Catalog
Project Catalogs
DAP
THREDDSCatalog
DAP
THREDDSCatalog
THREDDS Catalog
DAP
AnalysisTools
27
Matlab
API
IDVArcGIS ERDDAP
Community Catalog
Use Google instead of a Dedicated Catalog?
Project Data &
Services
Google & other search engine crawlers
NOAA WebSite
?
Community Catalogs
data.gov GEOSS
agreed convention to identify geodata servers
(e.g., /geodata.xml )
data
service
Users & Clients
28
? ?
Probably want both formal catalog & search engine support
Project Data &
Services
NOAA Master Catalog(machine API, spatial &
temporal queries, controlled vocabularies)
NOAA WebSite
UI
Community Catalogs
externalcatalogs
API
generalusers
simple search
data
service
Geoportal Server
GeoNetwork WAFTHREDDS
Catalog
Users & Clients
29
Google(free-text search)