Metadata Registries Workshop Metadata Registries Workshop U.S. Bureau of Labor Statistics Conference Center April 15-17, 1998
Metadata Registries WorkshopMetadata Registries Workshop
U.S. Bureau of Labor Statistics Conference Center
April 15-17, 1998
SPONSORS National Committee for Information
Technology Standards (NCITS) L8,
Data Representation
U.S. Environmental Protection Agency
U.S. Census Bureau
U.S. Bureau of Labor Statistics
U.S. Department of Transportation Intelligent
Transportation Systems
Joint Program Office
U.S. Department of Defense - Health System,
Health Data Administration Program
National Institute of Standards and
Technology
ORGANIZERS
Bruce Bargmeyer - U.S. Environmental Protection Agency
Cathryn Dippo - U.S. Bureau of Labor Statistics
Daniel Gillman - U.S. Census Bureau
William P. LaPlant, Jr. - U.S. Census Bureau
Douglas Mann - Battelle Memorial Institute
Judith Newton - National Institute of Standards and Technology
Phong Ngo - SAIC
CDR. Robert W. Mayes, R.N. - Health Care Financing Administration (HCFA)
Burton Parker - Paladin Integration Engineering
Andrew M. Shoka - MITRETEK Systems
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Workshop GoalsShare knowledge and experience
Focus on metadata registration standards ISO/IEC 11179, Specification and Standardization of
Data Elements DpANS X3.285, Metamodel for the Management of
Sharable Data
Discuss implementations based on these standards
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Workshop Goals
Facilitate collaborative efforts
Metadata Registry Development Metadata exchange between registries Standardize Content
Traditional data Terminology Unify text and data
Next generation registry standards XML, RDF Schema, XML - Data (Content model?)
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Standards for Data Standards for Data
AdministrationAdministration
Data Element Definitions Data Element Definitions
ISO/IEC 11179, Part 4ISO/IEC 11179, Part 4
Bruce BargmeyerU.S. Environmental Protection AgencyTel: (202) 260-5306Internet: [email protected]: http://sdct-sunsrv1.ncsl.nist.gov/~bargmeye
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Challenges
Data element definitions and descriptions are not sufficient to support reuse or multiple users of data
Finding one standard data element among thousands is difficult or impossible without classification schemes, thesaurus structures and other reference guides
Need to focus data standardization on the definition and domain values rather than names
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
A word or phrase expressing the essential nature of a person or thing or class of person or things: an answer to the question “what is x?” or “what is an x?”...(Webster’s Third New International Dictionary Unabridged, 1986)
A type of definition for data elements:
Definitions can be: Stipulative Precising Persuasive Intensional, Extensional, Lexical, ...
Types of Definitions
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Data Definition Rules
A data definition shall: Be unique (within a data dictionary) Be stated in the singular State what the concept is, rather than what it
is not Be stated as a descriptive phrase or
sentence(s) Contain only commonly understood
abbreviations Be expressed without embedding definitions
of other data elements or underlying concepts
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Data Definition Guidelines
A data definition should: State the essential meaning of the concept Be precise and unambiguous Be concise Be able to stand alone Be expressed without embedding rationale,
functional usage, domain information or procedural information
Avoid circular reasoning Use consistent terminology and structure for
related definitions
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Status
ISO 11179, Part 4 - Rules and Guidelines for the Formulation of Data Definitions
Passed International Standard Ballot in 1994
Published as International Standard 1995
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Epilog
There is useful information that is not included in the definition.
Purpose of collection Statistical method of collection Data values (domain), usage, ….
DpANS X3.285 extends data attribution to include some of the useful information left out of a definition.
Basic attributes Extensible set of attributes
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY CASE Tools and Metadata
Registries
Many CASE tools do not have a place to store the definition as a separate attribute.
“Description” can be a jumble of things
We are working to include the X3.285 metamodel into the designs of CASE Tools
and Registries.
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Standards for Data Standards for Data
AdministrationAdministration
Data Element Data Element
Classification Classification ISO/IEC 11179, Part ISO/IEC 11179, Part
22 Bruce BargmeyerU.S. Environmental Protection AgencyTel: (202) 260-5306Internet: [email protected]: http://sdct-sunsrv1.ncsl.nist.gov/~bargmeye
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY Data Elements-
Fundamentals
DataElementConcept
DataElement
ValueDomain
ObjectObjectClassClass
PropertyProperty RepresentationRepresentation
CoreData
Element
ApplicationData
Element
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY Utility of Data Element
Classification
Helps to locate one data element among many (thousands)
Helps to design similar data elements in uniform manner
Helps to resolve synonym and homonym problems
Provides context not possible to put into a definition
Provides definitions for words found in data element definitions and names
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Classification Structures What forms can classification take? Keywords
Controlled word lists
Terms from models
Thesaurus
Taxonomy
Ontology Acyclic directed graph, lattice Multiple inheritance
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Schemes
Library of Congress keywords
General European Multilingual Environmental Thesaurus (GEMET)
Integrated Taxonomic Information System (ITIS) - biological
Bill Kenworthey’s taxonomy of common abstract unit nouns
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Each node in a classification structure is a taxon (plural: taxa).
Given a classification structure, any taxa relating to a data element can be recorded
The taxa can be recorded in a separate “classification” attribute
With adequate software, users could access and navigate the classification structure
A nonintelligent identifier for each taxon helps to deal with change
Classification -Fundamental Notions
EPA Information and Data ManagementSDC-0055-057-JE-7031
UNITED STATES
EN
VIR
ON
ME
NTAL P R O T E C TION
AG
EN
CY
Status
ANSI & ISO Final committee draft is out for JTC1 ballot
Continuing R&D Concept is evolving
Search engines Middleware - agents, mediators, request brokers XML tags
Relationship to terminology management