1 Architecting Architecting a Corporate Metadata Repository Corporate Metadata Repository at the at the U.S. Bureau of Census U.S. Bureau of Census Gail Wright CMR Program Manager Technical Director Oracle Corporation [email protected]Agenda n Why a CMR? n What to include in a CMR? n Architecting a CMR n Leveraging a CMR
20
Embed
Why a CMR? What to include in a CMR? Architecting …...•Authority •Standard •Owner •Authority •Standard •Owner •Authority •Standard •Application • ... from the
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Architectinga
Corporate Metadata Repositoryat the
U.S. Bureau of Census
ArchitectingArchitectingaa
Corporate Metadata RepositoryCorporate Metadata Repositoryat theat the
U.S. Bureau of CensusU.S. Bureau of Census
Gail WrightCMR Program ManagerTechnical DirectorOracle [email protected]
BOC Current Business ProcessDoes not include an Integrated MetadataBusiness Process
BOC Current Business ProcessDoes not include an Integrated MetadataBusiness Process
internallydevelopedsystems
customizedcommercial
systems
CASES
variety ofprogramming
languages
GIDS
individualtool of choice
internallydevelopedsystems
customizedcommercial
systems
CASES
variety ofprogramming
languages
GIDS
individualtool of choice
CATICAPIMailPAPIOCSICM
CADECSAQOCRTDE
PFIRS
CATICAPIMailPAPIOCSICM
CADECSAQOCRTDE
PFIRS
internallydevelopedsystems
SAS
DEVSURV
COBOLFORTRANDECForms
StEPSECON DW
individualtool of choice
internallydevelopedsystems
SAS
DEVSURV
COBOLFORTRANDECForms
StEPSECON DW
individualtool of choice
DADS/AFF
CENSAS
FERRET
Econ DW
CD-ROM
Internet
ISS (future)
DADS/AFF
CENSAS
FERRET
Econ DW
CD-ROM
Internet
ISS (future)
Census 2000 AmericanCommunitySurvey
DemographicSurveys
Econ Census
Econ SurveysDesign Collect Process Share
What are the problems with the currentBusiness Process?What are the problems with the currentBusiness Process?
n Difficult to:n meet customer demands for quick turnaround of
surveys, and customized productsn re-use and share metadata within the BOCn maintain consistent standardsn compile and format metadata needed by dissemination
systemsn share metadata with external agencies, participate in
Virtual Statistical Agencies, etc.n meet new metadata requirements like FGDC’s CSDGM
content standardn perform time series or cross dataset comparisons
n Metadata integrity and quality can be compromised
4
Censusand
SurveyDesign
Censusand
SurveyDesign
DataCollection
DataCollection
DataProcessing
DataProcessing
DataDissemin-
ation
DataDissemin-
ation
Corporate M E T A D A T A RepositoryCorporate M E T A D A T A Repository
1998AnnualSurvey
1998AnnualSurvey
1998AnnualSurvey
1998AnnualSurvey
copy
1999AnnualSurvey
copycopycopy
1999AnnualSurvey
1999AnnualSurvey
1999AnnualSurvey
BOC Goal: An Integrated Metadata ProcessBOC Goal: An Integrated Metadata Process
8
What to includeWhat to includein ain a
CorporateCorporateMetadataMetadata
Repository?Repository?
5
9
n “Data about data”n Information about “raw” data that gives it meaning,
context or enhances understandingn Data about the Content, Quality, Condition, and
other characteristics about data
n Every informational asset that’s not datan Requirements, Data Models, Business Models,
Screen Layoutsn Data Mappings and transformationsn Hierarchies, Aggregation rules, Formulasn Rules for comparison of data sets and historical
n Strategic to BOC Enterprisen Opportunity for sharing and reuse of:
n Metadatan Meta-Model
n Generic vs. Application specific
CMR Meta-Models
Data Element Registry (ISO/IEC 11179 Standard)
Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry
(Support FGDC CSDGM Geospatial Metadata Standard)A Data Set is a collection of Data Elements.
Product Registry(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)A Data Product may be a file/document, website/URL, or physical object.
Data Store(OMG CWM Standard)
Metadata for the physical data store.(Supports Relational, Multi-
Dimensional, and Flat File stores)Business Rule Registry
Survey/Census: 1990 Decennial CensusSource: Bureau of the CensusDataset: 1990 Public Use Microdata Sample (PUMS)Description: The PUMS dataset has basic demographic information about
persons and housing in the U.S. This information comes from the 1990 Decennial Census long form which is randomly sent to 1 in every 7 households. This dataset is for public use and does not compromise the confidentiality of individuals.
Data Elements: ID - Record Identifier - A unique id for a record. Each record identifies 1 or more persons having the same demographic characteristics. (See WGT) WGT - Person Weight - A weight given to a record to represent the 1 or more persons with the same demographic characteristics. Valid values: 1..9 SEX - Person Gender - Valid values (0: male, 1: female) AGE - Person Age in Years - Valid values (0-90) Persons over 90 years of age are top-coded with an age of 90 for confidentiality reasons. MARITAL - Person Marital Status - Valid values (0: not applicable, 1: single, 2: married, 3: separated, 4: divorced, 5: widowed). Universe: Persons over 15 years of age. Those 15 and under are given a value of 0.
For more information: Related Datasets and Publications, Sampling Errors andTechniques, etc.
Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry
(Support FGDC CSDGM Geospatial Metadata Standard)A Data Set is a collection of Data Elements.
Product Registry(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)A Data Product may be a file/document, website/URL, or physical object.
Data Store(OMG CWM Standard)
Metadata for the physical data store.(Supports Relational, Multi-
Dimensional, and Flat File stores)Business Rule Registry