ebXML Registry and Repository for e-Government
Version 0.1b 6 July 2004
Document identifier:
wd-eGov-regrep-0p1b.doc
Location:
Editor:
Paul Spencer, Office of the e-Envoy, UK
([email protected])
Carl Mattocks, CheckMi ([email protected])
Contributors:
Farrukh Najmi, SUN Microsystems ([email protected])
Maewyn Cumming, Cabinet Office, UK
([email protected])
Abstract:
This document contains work-in-progress on the project to
provide a proof of concept registry repository for e-Government. It
is a working document and will change frequently.
Status:
This document is updated periodically on no particular schedule.
Committee members should send comments on this specification to the
mailto:[email protected] list.
Table of Contents
31Initial Proposal
31.1 Overview
31.2 Introduction
31.3 Project Outline
41.4 Deliverables
41.5 Timescales
51.6 Resources
62eGovernment – ebXML Registry Technical Note Guidelines for
e-Government Service use of ebXML Registry / Repository
62.1 Background
62.2 Service Centric Concepts
72.3 Federated Content Management
82.4 EbXML Registry Version 3
82.5 e-Government Service Requirements
92.6 Schema Component Definitions
102.7 Registration and Storage of EGSM Schema Components
102.8 Schema XML Component Name
112.9 XML Component Names
112.10 Use of Namespaces and Qualifiers
122.11 Version Management of Schema Element
122.12 Registration Of Schema Information
122.13 Publishing of Artifacts
132.14 Access to Registry / Repository
132.15 Classification of Artifacts
142.16 Storage of the knowledge embedded in a registered Data
Dictionary
152.17 Discovery and Deployment of Schema Components
162.18 Discovery and Deployment of Schema Components
162.19 Community Authoring
172.20 XML Component Suitability
172.21 BCM Template
172.22 Use of CAM Templates
193Important Features
193.1 Phase 1
193.2 Phase 2
204Mapping of Metadata to the ebRIM
214.1 Direct Mapping to ebRIM
304.2 Mapping via CCTS
304.3 URIs
31Appendix E: Revision History
32Appendix F: Notices
1 Initial Proposal
Status: v0.4 draft
Editor: Paul Spencer
1.1 Overview
This proposal is for a proof of concept / showcase project to
demonstrate how the ebXML registry, with suitable client
applications, can meet the needs of governments for a data
dictionary and XML schema registry / repository. The UK government
has offered to be the trial site for the PoC.
1.2 Introduction
The UK Government has a two-tier approach to XML data
dictionaries and XML schemas. At the top is the Government Data
Standards Catalogue (http://www.govtalk.gov.uk/gdsc/html/) with its
associated XML schemas. These are managed by the Office of the
e-Envoy (OeE) through the UK GovTalk™ website
(http:www.govtalk.gov.uk). The catalogue holds definitions used
commonly throughout government. Below this, each branch of
Government holds its own data dictionaries, some of which have XML
schemas associated with them. There are no common standards used
for the data dictionaries, but W3C XML Schema is the primary
standard for schemas.
There is clearly a need for these dictionaries and schema
repositories to be based on the same standards and to inter-operate
to help the UK and other governments meet their interoperability
aims.
1.3 Project Outline
Two branches of UK Government that already have data
dictionaries and XML schema registries/repositories are the OeE and
the Ministry of Defence (MOD). In neither case are these based
around the ebXML registry. This project is therefore to produce a
proof of concept (PoC) / showcase of how the ebXML registry and
products that implement its specifications can be used, with
suitable client applications, to meet the requirements of these
two, and by extension, other, government organizations.
The UK Government and these two organizations have been chosen
because:
1. they have existing data dictionaries / schema repositories
and so have experience of their use;
2. a recent paper for the MOD described a set of requirements
that can be used as basis for the PoC;
3. the MOD paper also outlined requirements for the OeE that can
be expanded and confirmed; and
4. the UK Government and both organizations are willing to
participate.
In both cases, it is important that the full requirements are
met. This is likely to mean development of interfaces (such as that
required to make the OeE information available via the UK GovTalk™
web site) and client applications (such as that required for the
MOD's approval process).
High level requirements currently identified for the PoC
are:
· it must hold the data currently held in the MOD ACCORD system
and the OeE Government Data Standards Catalogue;
· it must hold XML schema representations of these items and
relate them to their definitions;
· it must be possible to create schema documents from components
held in the repository;
· it must be possible to hold multiple versions of schema
components and complete schema documents;
· it must comply with the UK Government e-GIF standards
(http://www.govtalk.gov.uk/schemasstandards/egif_document.asp?docnum=731);
· it must support the UK Government e-GMS metadata standard
(http://www.govtalk.gov.uk/schemasstandards/metadata_document.asp?docnum=832)
(it is likely that the requirement and format for serialization of
the metadata will be reviewed as part of the project);
· it must support the existing processes for the approval,
update and removal of entries;
· the solution must be scalable to at least 100,000 data
items;
· it must be possible to integrate the registry and repository
with others in the UK Government domain, the international military
domain and other domains of interest;
· it should be possible to perform a "what-if" analysis, whereby
the impact of a planned change or deletion can be assessed; and
· it should also be possible to identify unused definitions so
that they can be purged.
1.4 Deliverables
Four deliverables are proposed:
1. A set of requirements to be met by the PoC
2. A description of the PoC
3. The products that prove that the concepts can be achieved
4. A final report on the project
1.5 Timescales
The duration of the project will be xxx months from starting the
requirements paper. This will be divided as follows:
· xxx weeks to agree requirements
· xxx weeks to design the system(s)
· xxx weeks to implement the phase 1 systems
· xxx weeks to implement the phase 2 systems
· xxx weeks to populate and use the systems
· xxx weeks to produce a final report
1.6 Resources
We propose that the eGov TC sets up a sub-committee to run this
project. Members of this sub-committee should be drawn from both
supplier and client organizations.
The project will need developer effort that has not yet been
identified.
Resource requirements and how they can be met must be identified
as early as possible and confirmed at each stage of the
project.
2 eGovernment – ebXML Registry Technical NoteGuidelines for
e-Government Service use of ebXML Registry / Repository
Status: v0.1 draft
Editor: Carl Mattocks
The goal of this Section is to provide guidelines on how the
standards being developed by the OASIS ebXML Registry, Business
Centric Methodology and CAM TCs can help meet the needs of
e-Government Service providers. The primary focus of the guidelines
is to support a usage scenario that includes –
· Registration and storage of Schema Components used in many
distinct Schemas
· Storage of the knowledge embedded in a registered Data
Dictionary
· Use of Data Dictionary items when managing schema
components
· Use of Registry / Repository ‘Context Declaration’ when
managing schemas employing UN/CEFACT Core Components
· Use of a schema assertion facilities such as CAM (Content
Assembly Mechanism) for binding structural, contextual and
referential information to schema components
· Classification of EGSM, Data Dictionary Items, Schema
Components, Context Declarations and Context Assembly Mechanism to
facilitate discovery and deployment
2.1 Background
Within the OASIS open source specifications body there are a
number of Technical Committee (TC) groups actively contributing to
the evolution of e-Government service oriented standards. This
technical note is focused on the specifications of the (i) the
Business-Centric Methodology (BCM), (ii) the ebXMLRegistry and
(iii) the Content Assembly Mechanism TCs that help explain how a
Registry / Repository can be used for the management of schema
components. Specifically, the goal of this Technical Note is to
provide standards –based guidelines on the management of web
service artifacts such as business language (nouns & verbs),
commerce metadata elements and schema properties.
2.2 Service Centric Concepts
A major emphasis of BCM is that a proper interpretation of the
business language semantics found in a SOA (Service Oriented
Architecture) metadata framework / classification system is
essential for harnessing tacit knowledge and facilitating shared
communications. Particularly, the BCM identifies that a Conceptual
Layer that enables the exploitation of community-of-interest
specific classifications, e-business taxonomies and systemic
patterns is a key factor in semantic interoperability. Further, the
contents of that BCM Conceptual Layer must be rich enough to
resolve all semantic (meaning & operability) conflicts over
terminology used to populate the many building blocks of the Lubash
Pyramid.
While not defining a mandatory structure BCM Version 1 states
that the Conceptual Layer consists of semantic relationships and
controlled vocabularies that increase the meaning of metadata and
provide context to items that have metadata properties. The
simplest form of this is a data dictionary that contains metadata
about data elements and their relationship between simple and
complex data types. BCM expects that when recorded in a registry
the Conceptual Layer has the role of:
· Providing trace-ability from business vision to system
implementation
· Ensuring alignment of business concepts with automated
procedures
· Facilitating faster information utilization between business
parties
· Enabling accurate information discovery and
synchronization
· Expanding the ability to integrate information by interest,
perspective or requirement.
2.3 Federated Content Management
The BCM also identifies that a registry combined with a
repository is a key factor in the management of service-oriented
components. Such as, metadata about schemas, data elements, their
associative links and any stored artifacts. Wherein, a registry not
only acts as an interface to a repository of stored content, it
formalizes how information is to be registered and shared. Since,
this may beyond a single enterprise or agency, this dictates that
the registry catalog must be capable of supporting metadata used
for federated content management.
Specifically, a federated content management capability is
required when there is as a need for managing and accessing
metadata across physical boundaries in a secure manner. Those
physical boundaries might be the result of community-of-interest,
system, department, or enterprise separation. Irrespective of the
boundary type, federated content management enables information
users to seamlessly access, share and perform analysis on
information. Which may include:
· Map of the critical path of information flowing across a
business value chain
· Quality indicators such as statements of information
integrity, authentication and certification
· Policies supporting security and privacy requirements
2.4 EbXML Registry Version 3
The EbXMLRegistry is a registry plus a repository. Version 3 of
the ebXML Registry / Repository supports the following types of
cooperating registry services
· Registration and classification of any type of object
· Objects defined by data type
· Namespaces defined for certain types of content
· Messages defined as XML Schemas
· Taxonomy hosting, browsing and validation
· Association between any two objects
· Registry packages to group any objects
· Links to external content
· Built-in security
· Event notification
· Event-archiving – enabling the production of a complete audit
trail
· Service registration and discovery
· Life cycle management of objects
· Flexible query options
Note: For inter-registry relocation, replication, references -
federation metadata is stored in one registry; a registry may
cooperate with multiple federations for the purpose of federated
queries, but not lifecycle mgmt.
2.5 e-Government Service Requirements
A key objective of e-government service management is to achieve
common understanding between the customer and provider through
managing service level expectations and delivering and supporting
desired results. Which in turn requires a common understanding of
the elements, which make up those services. To achieve this using a
Registry / Repository it is considered that each registered
e-Government Service Metadata (EGSM) artifact should be capable of
conveying the following information:
· An XML schema may be derived or expressed from the EGSM
artifact, yet the EGSM artifact must not preclude other formats of
instance data from being used within an operational system in the
future.
· The EGSM artifacts shall be readable by both humans and
application actors within an infrastructure and that the
applications shall be able to consistently derive structure from
the EGSM artifacts.
· The EGSM artifacts can explicitly point at or otherwise
reference a UML or other modeling artifact via a variety of
protocols (examples – HTTP/S, LDAP, FTP).
· The e-Government Service Metadata shall have a binding to a
set of RIM metadata and/or shall minimize replication of Registry
meta-metadata instances except where required for data
portability.
· The e-Government Service Metadata shall not constrain the
final representation in any way, yet must be capable of
facilitating multiple implementation serializations syntax
bindings) as represented via the UN/CEFACT core components
technical specification diagram.
· The EGSM artifact shall be capable of conveying semantics of
registered Data Dictionary Data elements.
· The EGSM artifact must be in a format capable of expressing
multi-byte character encoding such as UTF-16 in order to facilitate
internationalization.
· The EGSM artifact must be capable of being transformed easily
into other EGSM artifact formats (such as the UN/CEFACT ATG2 Core
Components/Business Information Entities Meta-metadata format.)
· The EGSM artifact must be capable of declaring semantic
equivalencies to other existing metadata objects. This is a
requirements based on an understanding that integration with
existing systems will be essential.
· The EGSM artifact must be capable of containing an intrinsic
relationship to context declarations in order to facilitate the
above requirements, possibly in addition to the registry
relationships expressed within a registered data dictionary, ebXML
RIM and ISO/EIC 11179 parts 1-5.
· The EGSM artifact must facilitate both basic (atomic) Data
Elements as well as more complex aggregates. The aggregates to be
designated as UN/CEFACT aggregate core components (ACCs) and
represented as aggregate business core components using XML
schema.
· The EGSM artifact should be written in a way so programmers
can write implementations, yet if the EGSM ARTIFACT model changes,
the implementations will not be broken. This is referred to as
forwards compatibility.
2.6 Schema Component Definitions
At a business level, the primary function of XML is to provide a
meta-language for rigorously specifying the syntax of information
exchange. Since information exchange involves multiple parties (at
a minimum one sender and one receiver), XML specifies agreements
between parties within a community of interest for a particular
domain of information. XML itself does not require or provide a
mechanism for defining semantics (precisely what is meant by a
particular term); however, to achieve interoperability, both the
syntax and semantics must be explicitly defined. The process of
selecting proper component names and reaching agreements on the
definitions is primarily a business function of XML and MUST
involve all stakeholders.
The terms (XML) schema and (XML) schema document are often used
interchangeably to refer to XML documents containing schema
elements expressed in XML as described in the W3C Recommendation.
There is also a more precise technical meaning for schema, as the
exact abstract data structure required to schema-validate an
element of an XML document (this is described in detail in the W3C
XML Schema Recommendation Part 1). For the purposes of this
document, schema is normally used loosely, to mean a schema element
within an XML document. The term schema document is used to mean an
XML document containing one or more schema elements.
EbXMLRegistry schema component management involves using a
Registry / Repository for the registering and storage of schema
elements, XML documents and related artifacts. It specifically
includes the tasks of:
· registering proposed schema components as drafts;
· reviewing proposed schema components;
· registering approved schema components;
· discovering schema components;
· assembling complete schemas from components; and
· managing the lifecycle of the components and schemas
2.7 Registration and Storage of EGSM Schema Components
To meet the need of common understanding every registered schema
MUST contain the following metadata:
· Schema Name
· Namespace(s)
· A description of the purpose of the schema
· The name of the application or program of record that created
and and/or manages the schema
· The version of the service application or program of
record
· A short description of the service application interface that
uses the description. A URL reference to a more detailed interface
description may be provided
· Developer point of contact information to include activity,
name and email
2.8 Schema XML Component Name
This section provides guidance on use of the registry, and is
non-normative.
To maximize understanding and facilitate automated analysis of
schema components during harmonization efforts the selection of XML
component names MUST be a thoughtful process involving business,
functional, data and system subject matter experts. Use of ISO
11179 conventions is encouraged. For instance, XML components MAY
be named after ISO 11179 data element names: XML Elements SHOULD be
named after ISO 11179 data element definitions when business terms
do not exist. XML Attributes SHOULD be named after ISO 11179 data
elements. XML Schema data types MUST be named after ISO 11179 data
elements.
Specifically, ISO 11179 part 5 provides a standard for creating
data elements. This standard employs a dot notation and white space
to separate the various parts of the element and multiple words in
a part respectively. In order to meet XML requirements for
component naming, the ISO 11179 name must be converted to a Name
Token. The ISO 11179 part 5 standard provides a way to precisely
create a data element definition and name. Using or referencing
this name in a schema provides analysts with a better understanding
of XML component semantics, while using business terms as element
names improves readability.
2.9 XML Component Names
This section provides guidance on use of the registry, and is
non-normative.
Authors creating new elements SHOULD follow the ebXML guidance
for usage of acronyms or abbreviations in XML component names with
the following caveats. Acronyms and abbreviations SHOULD generally
be avoided in XML element and attribute names. For XML Schema data
types, abbreviations MUST be avoided while acronyms MAY be used
consistent with the rest of this guidance. When acronyms are used
they MUST be in upper case. Abbreviations SHOULD be treated as
words and expressed in upper camel case. The decision to use an
acronym or abbreviation MUST be based on the belief that its use
will promote common understanding of the information both inside a
community of interest as well as across multiple communities of
interest. When an acronym or abbreviation does not come from a
credible, identifiable source or when it introduces a margin for
interpretation error, it MUST NOT be used.
Acronyms and abbreviations used in component names MUST be
spelled out in the component definition that is required to be
included via schema annotations (as XML comments or inside XML
Schema annotation elements) References to authoritative sources
from which the acronyms or abbreviations are taken SHOULD also be
included in schema documentation
2.10 Use of Namespaces and Qualifiers
This section provides guidance on use of the registry, and is
non-normative.
When creating a namespace it is recommended that authors use a
qualifier (a prefix - normally xsd: or xs: ) for the XML Schema
namespace. This makes the usage of namespaces more explicit, and
allows schema designers more flexibility in using namespace within
the schema.
http://www.govtalk.gov.uk/documents/Schema%20Guidelines%202.doc
Make the defaultNamespace for the schema the same as the
targetNamespace. This allows architectural schemas with no
namespace to be included without causing namespace problems.
Use a suitable qualifier for other namespaces.
Set elementFormDefault to qualified and attributeFormDefault to
unqualified. This ensures that the user of a schema does need to
understand its internal structure.
2.11 Version Management of Schema Element
The version management capabilities of the Registry / Repository
enable three issues of XML management to be addressed:
· proposing and approving XML data types and elements;
· version management of XML data types; and
· assembling data types into schemas for message types.
2.12 Registration Of Schema Information
The following high-level diagram shows the relationship between
registry and repository when managing XML schemas and documents
support the schemas.
metadata
processor
repository
registry and
indexed metadata
schema &
supporting docs
Fig 1 - registration
2.13 Publishing of Artifacts
In terms of publishing content the ebXML Registry / Repository
specification supports:
· publishing to a central registry / repository; or
· publishing to a federation of many individually many registry
/ repository faculties.
Note: There are therefore two basic models of distributed
information - a central repository of shared items, with individual
public sector organizations uploading and downloading as required
or a fully distributed model with the repository distributed over
multiple facilities (a local and many remote).
2.14 Access to Registry / Repository
EbXML Registry specification supports a single access to many
federated Registry / Repository facilities. Thus, it allows:
· logical duplication of remote federated repository items into
a local federated repository to fit into local policies of
information management; or
· aggregation of artifacts in the remote federated repository
for creating locally defined components; or
· access to any and all federated repository items as
required.
2.15 Classification of Artifacts
To ease discovery and deployment of artifacts the ebXMLRegistry
RIM explicitly supports many Classification Schemes. Currently
ebXML Registry allows content to be classified using a
ClassificationNode within a ClassificationScheme.
The classification scheme identified within the context of ISO
11179 and ebXML
provides for a number of uses:
· Find a single element from among many
· Analyze data elements
· Convey semantic content that may be incompletely specified by
other attributes
· such as names and definitions
· Derive names from a controlled vocabulary
· Disambiguate between data elements of varying classification
power:
Note:
The basic flow consists of:
1. Schema author publishes schema components
2. Schema author classified schema components using a class
reference within a Classification
2.16 Storage of the knowledge embedded in a registered Data
Dictionary
It is assumed that a typical Data Dictionary contains between
4000 entries and 100,000 entries. The concepts embedded in Data
Dictionary Elements may be sourced from many different
contributors. One source the may be the synonymous Business
Information Entities used for Core Component developments. The key
difference being that UN/CEFACT CCWG Core Component is envisioned
as a global set of business collaborations vs. the typical local
Data Dictionary has been scoped solely for a particular domain. The
following naming rules may also be applied to the management of
Data Dictionary Elements;
· The Dictionary Entry Name shall be unique and shall consist of
Object Class, a Property Term, and Representation Type.
· The Object Class represents the logical data grouping (in a
logical data model) to which a data element belongs” (ISO 11179).
The Object Class is the part of a core component’s Dictionary Entry
Name that represents an activity or object in a context.
· An Object Class may be individual or aggregated from core
components. It may be named by using more than one word.
· The Property Term shall represent the distinguishing
characteristic of the business entity. The Property Term shall
occur naturally in the definition.
· The Representation Type shall describe the form of the set of
valid values for an information element. If the Representation Type
of an entry is “code” there is often a need for an additional entry
for its textual representation. The Object Class and Property Term
of such entries shall be the same. (Example : “Car. Colour. Code”
and “Car. Colour. Text”).
· A Dictionary Entry Name shall not contain consecutive
redundant words. If the Property Term uses the same word as the
Representation Type, this word shall be removed from the Property
Term part of the Dictionary Entry Name. For example: If the Object
Class is “goods”, the Property Term is “delivery date”, and
Representation Type is “date”, the Dictionary Entry Name is ‘Goods.
Delivery. Date’. In adoption of this rule the Property Term
“Identification” could be omitted if the Representation Type is
“Identifier”. For example: The identifier of a party (“Party.
Identification. Identifier”) will be truncated to “Party.
Identifier”.
· One and only one Property Term is normally present in a
Dictionary Entry Name although there may be circumstances where no
property term is included; e.g. Currency. Code.
· The Representation Type shall be present in a Dictionary Entry
Name. It must not be truncated.
· To identify an object or a person by its name the
Representation Type “name” shall be used.
· A Dictionary Entry Name and all its components shall be in
singular form unless the concept itself is plural; e.g. goods.
· An Object Class as well as a Property Term may be composed of
one or more words.
· The components of a Dictionary Entry Name shall be separated
by dots followed by a space character. The words in multi-word
Object Classes and multi-word Property Terms shall be separated by
the space character. Every word shall start with a capital
letter
· Non-letter characters may only be used if required by language
rules.
· Abbreviations, acronyms and initials shall not be used as part
of a Dictionary Entry Name, except where they are used within
business terms like real words; e.g. EAN.UCC global location
number, DUNS number
· All accepted acronyms and abbreviations shall be included in
an ebXML glossary
2.17 Discovery and Deployment of Schema Components
It is recognized that the classification approach employed must
support the discovery and deployment of schema components in a
target namespace relating to the project for which the schema is
being developed. The stages of deployment include:
· Search for suitable components
· Develop new components
· Develop the structure of the new schema centric
documents/messages
· Register the new schema components and documents/messages
· Notify users of new versions of components that they are
using
· Identify users of obsolescent components
· Remove obsolete components
2.18 Discovery and Deployment of Schema Components
It is recognized that the classification approach employed must
support the discovery and deployment of schema components in a
target namespace relating to the project for which the schema is
being developed. The stages of deployment include:
· search for suitable components
· develop new components
· develop the structure of the new schema centric
documents/messages
· register the new schema components and documents/messages
· notify users of new versions of components that they are
using
· identify users of obsolescent components
· remove obsolete components
2.19 Community Authoring
Given that artifacts, such as, schema components and dictionary
entries often need to be developed collaboratively by a group of
geographically dispersed domain experts.
· Each Domain Experts creates a different xml component /
dictionary entry
· Each Domain Expert may review the xml component / dictionary
entry produced by others.
· Each Domain Expert may edit a xml component / dictionary entry
that they or another Domain Expert created with appropriate access
control.
Basic Flow :
1. Domain expert #1 publishes a xml component / dictionary
entry
2. Domain expert #2 publishes another xml component / dictionary
entry and connects it to first xml component / dictionary entry
3. Domain expert #1 and #2 review each others xml component /
dictionary entry associations
4. Domain expert #1 and #2 edits xml component / dictionary
entry to address comments or fix errors
2.20 XML Component Suitability
Given that authors wish to only develop xml components when they
are needed it is recommended that new components are only created
when (1) Suitable xml components do not exist, (2) Existing xml
components do not suffice or are not appropriate for the intended
application Therefore, the ebXMl Registry MUST be searched for
existing suitable components prior to creation of new components.
There are three possible results for this search. Components may be
fully or partially suitable, or no component may be found. A
component may considered suitable if:
· It satisfies the element domain requirements,
· It is in upper/lower camel case depending on whether it is an
element, attribute or type,
· Is either named after a “business term”, or conforms to ISO
11179 conventions and
· Abbreviations and acronyms are spelled out in the component
definition
2.21 BCM Template
Following on from the template definitions in the business
layers, the BCM method proceeds first to establish the templates of
a collaboration agreement and optionally a traditional memorandum
of agreement (item 4 in 5.3.1). Once the collaboration is agreed,
then the associated information exchanges to implement that
collaboration can be defined (items 8, 9, 10 in 5.3.1). The
information transactions require careful detailing of the
semantics. There are verbs, nouns, roles, rules and message
structures to quantify. In traditional software development this is
the place most people begin. The question frequently asked is “do
we have a XML schema to use?” with the assumption that if so then
the participants are ready to start exchanging XML conforming to
the schema and facilitating eBusiness. In order to engage in
effective information exchanges and especially across an industry
group with multiple participants, experience has shown and the BCM
expects a greater depth of semantic knowledge than a simple schema
provides. Conversely an OASIS CAM template definition provides the
entire noun, verb and context semantics for complete transaction
management including integration into a registry vocabulary
dictionary without the need for highly specialized software.
2.22 Use of CAM Templates
The Schematron is a language and toolkit for making assertions
about patterns found in XML documents. It can be used as a friendly
validation language and for automatically generating external
annotation (links, RDF, perhaps Topic Maps). Because it uses paths
rather than grammars, it can be used to assert many constraints
that cannot be expressed by DTDs or XML Schemas.The Content
Assembly Mechanism employs templates to bind structural, contextual
and referential information to schema components. In order to allow
dynamic assignments of context to a Schema Component instance the
CAM may be used . The figure below provides an outlines how those
information facets maybe brought together for a ‘reliable Messaging
System ‘.
XML
business information
Schema
Delivery
Assembly
Schema:
Content structure definition
and
simple content typing
Content Assembly:
Business logic for
content structure decisions and
explicit rules to enforce content,
and interdependencies, with
business exchange context,
and content definition
cross
-
references via
UID
associations
Secure Authenticated Delivery and Tracking:
Reliable Messaging system, envelope format and payload with
exch
ange participant profile controls
UID
content referencing system
ensures consistent definition usage
UID
Registry/
Dictionary
UID
–
Universal ID content referencing system
values
–
comprise of domain prefix, six digit integer, optional
version,
sub
-
version.
3 Important Features
Status: v0.1 draft
Editor: Paul Spencer
This is an attempt to highlight as a set of bullet points the
major features required for the registry/repository at the proof of
concept stage. It excludes the features inherent in the registry
(such as version control, user notification etc) that are assumed
to be included.
Most benefit will be gained by an early release. I have
therefore split this into two phases.
3.1 Phase 1
1. The ability to enter schemas and the associated metadata into
the registry/repository.
2. The ability to enter schema components (global data types,
elements and attributes) and associated metadata into the
registry/repository.
3. The ability to enter other document types with associated
metadata into the registry/repository.
4. The ability to hold schema definitions in a
syntax-independent manner (e.g. as defined in CCTS?). Effectively,
this means that, for schema components, sufficient information must
be held in the registry to create the component from metadata,
although the ability to create the components will not be
included..
5. The metadata to be supported will vary according to the three
document types (schema, schema component or other) and will be a
subset of that defined in the UK e-GMS plus the additional
requirements of point 4.
6. The ability to search on certain metadata information and
extract all matching schemas, components or other documents.
7. The ability to construct schemas from components.
3.2 Phase 2
8. The ability to interoperate between registries.
9. The ability to add MOD-specific metadata.
4 Mapping of Metadata to the ebRIM
Status: v0.1 draft
Editor: Paul Spencer
It is not clear whether it is best to map metadata elements
directly to the ebRIM for Government Use or go through a CCTS
mapping as an interim stage. We are therefore trying both. See the
email from Carl Mattocks:
… Since, the CCTS approach is still in its infancy, I predict
that the task will take longer than we would like. Therefore, I
propose that to make the best of the situation, we do -
(1) a FULL direct to ebRIM mapping for the selected e-GMS
metdata subset
AND
(2) a FULL mapping to CCTS (and then using CCRIM) for the
selected e-GMS metadata subset
AND
(3) document the results of BOTH in the Technical Note. ...
and let the reader be aware they have a choice.
Agreed - it would be useful to publish a sample of (2) above.
Hopefully, this can be done in a couple of weeks.
This is the approach we are taking.
4.1 Direct Mapping to ebRIM
Notes:
1. Some names are abbreviated and shown with an ellipsis. See
the e-GMS for full names.
2. Where a refinement name starts with the name of its parent
item, the parent name has been omitted for brevity. See the e-GMS
for full names.
3. The last three columns indicate whether the metadata item is
to be supported for schema documents, schema components and other
document types. The codes used are:
a. Mmandatory
b. MAmandatory if applicable
c. Rrecommended
d. RArecommended if applicable
e. Ooptional
f. n/anot required
The code is in bold if the metadata item is be supported in the
PoC.
4. The usage in schema documents is based on the e-GMS local
metadata standard - XML schemas version 3 (draft) and an email from
Maewyn Cumming to Paul Spencer on 2004-06-21. It is still under
discussion, but the values here should be used for initial
implementation.
5. The columns for schema components and other document types
are to be completed.
UK e-GMS
Enumeration
Mapping to RIM
Autogenerate
Schema Docs
Schema Comps
Other
Accessibility
n/a
n/a
Addressee
n/a
n/a
Aggregation
this might be modelled through RIM Associations. It is relevant
if some document is part of a larger collection.
O
Audience
n/a
n/a
Contributor
Association to Person (User for now) or Organization with
associationType “Contributor”
MA
Coverage
MA
Coverage. Spatial
Classification using chosen GEO ClassificationScheme
RA
Coverage. Temporal
Must support
RA
Creator
Association to Person (User for now) or Organization with
associationType “Creator”
M
Date
Must support
O
Date. Acquired
n/a
Date. Available
n/a
Date. Created
2003-04-06
R
Date. Cut-off
n/a
n/a
Date. Closed
n/a
n/a
Date. Accepted
n/a
n/a
Date. Copyrighted
O
Date. Submitted
n/a
Date. Declared
n/a
Date. Issued
yes
MA
Date. Modified
yes
M
Date. NextVersionDue
O
Date. UpdatingFrequency
O
Date. Valid
MA
Description
Description
O
Description. Abstract
n/a
Description. TableOfContents
n/a
DigitalSignature
n/a
n/a
Disposal
n/a
n/a
Disposal.AutoRemoveDate
n/a
n/a
Disposal. Action
deprecate, remove, archive
value is String or id of a Action ClassificationNode?
O
Disposal. AuthorisedBy
O
Disposal. Comment
O
Disposal. Conditions
O
Disposal. Date
O
Disposal. ExportStatus
O
Disposal. Review
O
Disposal. ReviewerDetails
value is id of User
O
Disposal. ScheduleID
O
Disposal. TimePeriod
O
Format
for schemas & comps: text/xml
This should probably be supported as an alternative to the
refinements below that are not included in the e-GMS v3. For
schemas, this would always have the value
"Text/http://www.w3.org/2001/XMLSchema" and so could be
autogenerated when serialising metadata.
yes (for schemas & comps)
M
Format. Extent
n/a
Format. Medium
n/a
Identifier
ExternalIdentifier
M
Identifier.BibliographicCitation
Identification ClassificationScheme BibliographicCitation
n/a
Identifier. CaseID
Identification ClassificationScheme CaseID
n/a
Identifier. FileplanID
Identification ClassificationScheme FilePlanID
n/a
Identifier. SystemID
Identification ClassificationScheme SystemID
n/a
Language
This could be an enumeration of the ISO 639-2/B language codes
using the UBL codelist format, but I would leave it as a slot for
now.
R
Location
n/a
n/a
Mandate
n/a
n/a
Mandate.AuthorisingStatute
n/a
n/a
Mandate. DataProtection…
n/a
n/a
Mandate. PersonalData…
n/a
n/a
Preservation
n/a
n/a
Preservation.OriginalFormat
n/a
n/a
Publisher
Association to Person (User for now) or Organization with
associationType “Publisher”
?
M
Relation
Association with associationType matching refinement for
Relation. Relation can be used without refinements (for example to
link to supporting documents).
n/a
n/a
Relation. ConformsTo
http://www.w3.org/2001/XMLSchema
Must Support
M
n/a
Relation. HasFormat
Must Support
MA
n/a
Relation. HasVersion
Must Support
MA
n/a
Relation. HasPart
Association with associationType “HasPart”
yes (for schemas)
MA
n/a
Relation. IsDefinedBy
Must support*
MA
n/a
Relation. IsFormatOf
n/a
n/a
n/a
Relation. IsPartOf
Association with associationType “IsPartOf”
MA
n/a
Relation. IsReferencedBy
n/a
n/a
n/a
Relation. IsReplacedBy
Must support*
MA
n/a
Relation. IsRequiredBy
Must support
n/a
n/a
Relation. IsVersionOf
Must support
MA
n/a
Relation. ProvidesDefinitionOf
Association with associationType “ProvidesDefinitionOf”
MA
n/a
Relation. ReasonForRedaction
n/a
n/a
n/a
Relation. Redaction
n/a
n/a
n/a
Relation. References
n/a
n/a
n/a
Relation. Requires
Association with associationType “Requires”
yes (for schemas)
MA
n/a
Relation. Replaces
Must support*
MA
n/a
Relation. SequenceNo
n/a
n/a
n/a
Rights
n/a
Rights. Copyright
O
Rights. Custodian
value is id of a User or SubjectRole or SubjectGroup
O
Rights. Descriptor
n/a
n/a
Rights. DisclosabilityTo…
n/a
n/a
Rights. DPADataSubject…
n/a
n/a
Rights. EIRDislosability…
n/a
n/a
Rights. EIRExemption
n/a
n/a
Rights. FOIADisclosability…
n/a
n/a
Rights. FOIAExemption
n/a
n/a
Rights. FOIAReleaseDetails
n/a
n/a
Rights. FOIAReleaseDate
n/a
n/a
Rights. GroupAccess
n/a
n/a
Rights. IndividualUser…
n/a
n/a
Rights. LastFOIA…
n/a
n/a
Rights. PreviousProtectiveMarking
n/a
O
Rights. ProtectiveMarking
Could we leave this in until I have spoken to Maewyn. I think
the MOD will want this.
O
Rights. ProtectiveMarkingChangeDate
n/a
O
Rights. ProtectiveMarkingExpiryDate
n/a
O
Source
n/a
n/a
Status
Must Support* This seems to complement the RIM status, and could
be a qualifier added to that.
O
Subject
n/a
Subject. Category
Uses Government Category List
M
Subject. Keyword
Multiple values for each Slot or multiple slot one per
keyword?
O
Subject. Person
n/a
n/a
Subject. ProcessIdentifier
O
Subject. Programme
O
Subject. Project
O
Title
M
Title. AlternativeTitle
n/a
n/a
Type
[empty string]
message
architectural
element
type
M
refinements of Type
n/a
n/a
4.2 Mapping via CCTS
Awaiting information from Carl
4.3 URIs
When mapping to the ebRIM, URIs are used as identifiers in
various places. The two types of URI usually used in such cases are
URNs (e.g. urn:gov:uk:egms:date) and URLs (e.g.
http://www.govtalk.gov.uk/terms/copyrighted). In general, OASIS
prefers the use of the URN.
However, the e-GMS is based on Dublin Core, which uses URLs to
specify metadata names. This is an extract from an email from
Maewyn Cumming (2004-06-18):
We had thought about this for the e-GMS application profile and
used the format http://www.govtalk.gov.uk/terms/accessibility
for each element, refinement etc. This follows the Dublin
Core model, and is what we have put into the AP (though with the
caveat that none of these URLS actually work yet). I'd like to keep
following the same format.
In discussion, it was agreed that a refinement would use an
additional oblique, such as
http://www.govtalk.gov.uk/terms/date/created.
This is the format to be used for e-GMS metadata but does not
constrain the format for other types of metadata.
Appendix E: Revision History
Rev
Date
What
0.1a
2 July 2004
First draft to pull together some existing documents.
0.1b
6 July 2004
Additional column added to direct mapping table to indicate
which metadata items are to be auto-generated by the registry.
Other minor changes to table.
Appendix F: Notices
OASIS takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on
OASIS's procedures with respect to rights in OASIS specifications
can be found at the OASIS website. Copies of claims of rights made
available for publication and any assurances of licenses to be made
available, or the result of an attempt made to obtain a general
license or permission for the use of such proprietary rights by
implementors or users of this specification, can be obtained from
the OASIS Executive Director.
OASIS invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to implement
this specification. Please address the information to the OASIS
Executive Director.
Copyright © OASIS Open 2002. All Rights Reserved.
This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise
explain it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works.
However, this document itself does not be modified in any way, such
as by removing the copyright notice or references to OASIS, except
as needed for the purpose of developing OASIS specifications, in
which case the procedures for copyrights defined in the OASIS
Intellectual Property Rights document must be followed, or as
required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not
be revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided
on an “AS IS” basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
�PAGE \# "'Page: '#'�'" ��EML uses Schematron to define
context-sensitive rules, and this works well. I don't want to
exclude CAM (which I think is Content Assembly Mechanism), but we
should be able to link Schematron artifacts as well.
�PAGE \# "'Page: '#'�'" ��I have made these up to allow a
classification. Do they seem reasonable? Or is free text
better?
�PAGE \# "'Page: '#'�'" ��I don't think so … But we probably
don't need it anyway.
_1081757678.vsd
_1104663501.bin