Top Banner
The <indecs> Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS
29

The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Mar 26, 2015

Download

Documents

Hailey Garcia
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The <indecs> Data Dictionary

Norman Paskin, International DOI Foundation

ELECTRONIC COMMUNICATION OF LICENCE TERMS

Page 2: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

<indecs>

• 1998-2000: Interoperability of Data in E-Commerce Systems: www.indecs.org

• Focus: generic intellectual property and how to make data about it interoperable

• EC + groups from the content, author, creator, library, publisher and rights communities

• Pioneered a model of event-based metadata as a solution for integrating rights. – For “e-commerce” read “automation”

• Influenced by CIS and FRBR: – 1995+ : Common Information System “CIS” (CISAC) – music rights– 1998: Functional Requirements of Bibliographic Records “FRBR” (IFLA) –

library cataloguing

• Has been used and developed further

Page 3: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

doi>Why do we need a “data dictionary”?

• There’s lots of metadata already – Which should be (re-) used

• People use different schemes – So we need to map from one scheme to another

• Data (identifiers, metadata) assigned in one context or scheme may be encountered, and may be re-used, in another place (or time or scheme) - without consulting the assigner. You can’t assume that your assumptions will be known to someone else.

– Interoperability = the possibility of use in services outside the direct control of the issuing assigner

– This is a prerequisite for communication (of rights terms or anything else)

• Does “owner” in scheme A mean “owner” in scheme B? – We need to map meanings– A prerequisite for extensibility

Page 4: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

doi>What is a “data dictionary”?

• A set of terms, with their definitions • used in a computerized system

• Some data dictionaries are structured, with terms related to other terms through hierarchies and other relationships: structured data dictionaries are derived from ontologies.

• An ontology combines a data dictionary with a logical data model, providing a consistent and logical world view.

• An interoperable data dictionary contains terms from multiple computerized systems or metadata schemes, and shows the relationships they have with one another in a formal way.

• The purpose of an interoperable data dictionary is to support the use together of terms from different systems.

• Indecs DD is structured (ontology based) and interoperable

Page 5: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Metadata scheme e.g. ONIX

Metadata scheme e.g. SCORM

Agreed term-by-term mapping or“Crosswalk”

Page 6: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Metadata scheme e.g. ONIX

Metadata scheme e.g. SCORM

Page 7: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

DataDictionary

Metadata scheme e.g. ONIX

Metadata scheme e.g. SCORM

ONIX:Author = NormanRights:Writer

Metadata SchemeNormanRights

Term “Author”

Term “Writer”

Page 8: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Metadata interoperability: semantic problems

But such mappings are not simple:

• Different names (and languages) for the same thing (journal_article vs SerialArticleWork)

• Same name for different things (title, Title)

• Data elements at different levels of speciality (title vs FullTitle, AlternativeTitle).

• Different allowed values for elements (pii vs not pii)

• Data at different levels of granularity (journal_article vs SerialArticleWork/SerialArticleVersion).

• Data in different structures (article as attribute of journal or vice versa).

• Data from different sources (local codes vs ONIX codes).

• Different contextual meaning (DOI of what…?)

• Different representation (1 title vs n titles).

• Different mandatory requirements (ISSN mandatory vs optional)

• Schemas are being updated all the time. . . . . etc.

Requires a coherent structured approach.

doi>

Page 9: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

doi>So how do we make sense of this?

• Data dictionary uses an “ontology”• “An explicit formal specification of how to represent the

objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them”

• Because relationships can be complex

Page 10: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The dictionary model doi>• The methodology is the <indecs> one (as developed in

more detail for the MPEG RDD)

• This has also been developed further (OntologyX)

• It uses the “context model” – i.e. events based (a common ontology approach)

• We think of metadata as “thing” or “people” based.– static views e.g. about “creation B”

• But then how do we link things, e.g. to describe rights activities?

• By describing “events”; relating things and people– dynamic views e.g. “A created B”

• Events description is also the key to rights metadata– all rights transactions are events

Page 11: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The dictionary model doi>

Agent

PlaceTime

Resource

Page 12: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The dictionary model doi>

Agent

PlaceTime

Resource

Norman Paskin

London

041202BICNISO.ppt

2004-12-02

Page 13: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The dictionary model doi>

Agent

PlaceTime

Resource

Event: Norman Paskin presented 041202.ppt in London on 2 Dec 2004

Page 14: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

ContextType

Agent

Context

Place

Resource

Time

ResourceType

TimeType

PlaceType

HasAgentType

HasValue

HasResourceType

HasTimeType

HasPlaceType

HasValue

Context ModelKey

Values of Basic Terms

Types of Basic Terms

RelatingTerms

AgentType

HasValue

HasValue

HasValue

Page 15: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Agent

Place

Resource

Time

ActionFamilyRelationalView (“AFRV”)

Key

Values of Basic Terms

AFRV Relating Terms for the “Act” Action Family

IsAgentInPlace

IsPlaceOfActingBy

IsAgentActingOn

IsTimeOfActingBy

IsAgentAtTime

IsActedOnBy

IsResourceAtTime

IsResourceInPlace

IsPlaceOfBeingActedOnOf

IsTimeOfBeingActedOnOf

IsPlaceOfActingAtTime

IsTimeOfActingInPlace

HasCo-Resource

HasCo-PlaceOfActing

HasCo-Agent

HasCo-TimeOfActing

Page 16: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

ContextType

Agent

Context

Place

Resource

Time

ResourceType

TimeType

PlaceType

HasAgentType

HasValue

HasResourceType

HasTimeType

HasPlaceType

HasValue

RDD Context ModelKey

Values of Basic Terms

Types of Basic Terms

RelatingTerms

AgentType

HasValue

HasValue

HasValue

ContextType

Agent

Context

Place

Resource

Time

ResourceType

TimeType

PlaceType

HasAgentType

HasValue

HasResourceType

HasTimeType

HasPlaceType

HasValue

RDD Context ModelKey

Values of Basic Terms

Types of Basic Terms

RelatingTerms

AgentType

HasValue

HasValue

HasValue

Agent

Place

Resource

Time

ActionFamilyRelationalView (“AFRV”)Key

Values of Basic Terms

AFRV Relating Terms for the “Act” Action Family

IsAgentInPlace

IsPlaceOfActingBy

IsAgentActingOn

IsTimeOfActingBy

IsAgentAtTime

IsActedOnBy

IsResourceAtTime

IsResourceInPlace

IsPlaceOfBeingActedOnOf

IsTimeOfBeingActedOnOf

IsPlaceOfActingAtTime

IsTimeOfActingInPlace

HasCo-Resource

HasCo-PlaceOfActing

HasCo-Agent

HasCo-TimeOfActing

Agent

Place

Resource

Time

ActionFamilyRelationalView (“AFRV”)Key

Values of Basic Terms

AFRV Relating Terms for the “Act” Action Family

IsAgentInPlace

IsPlaceOfActingBy

IsAgentActingOn

IsTimeOfActingBy

IsAgentAtTime

IsActedOnBy

IsResourceAtTime

IsResourceInPlace

IsPlaceOfBeingActedOnOf

IsTimeOfBeingActedOnOf

IsPlaceOfActingAtTime

IsTimeOfActingInPlace

HasCo-Resource

HasCo-PlaceOfActing

HasCo-Agent

HasCo-TimeOfActing

Context Model ActionFamilyRelationalView (“AFRV”)

are two different models of the relationships between the entities

Agent

Place

Resource

Time

Page 17: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Building views of “metadata”…

• Q: “This isn’t how I think of my metadata! ” ..”it’s just a series of “things about” something. How does

this more complex approach fit what I have?

• A: This is simply a deeper view for the purposes of analysis..

You don’t need to change your own approach.

The “events” view builds from the simple “things about” view:

Page 18: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

1. attribute view – simplest, most direct: “things about…”

isbn “0297816470”Author S Pinker

(values may be strings, IDs etc)

entity attribute

Building views of “metadata”…

Page 19: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

2. association or relationship view – richer, more indirect:

book “0297816470” hasTitle “Words & Rules”

• treats attributes as defined entities

and others e.g. book “0297816470” hasAuthor “Stephen Pinker”

•allows multiple occurrences

relationshipentity entity

Building views of “metadata”…

Page 20: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

3. context view – richest, most indirect

publishingEvent hasAgentType publisher “Weidenfeld”publishingEvent hasResourceType book “0297816470”publishingEvent hasTimeType dateOfPublication “2002”publishingEvent hasPlaceType placeOfPublication “UK”

• Analysis moves from attribution to attribution process (Event) • Most efficient handling of complex multiple metadata e.g. a rights catalogue (“all rights transactions are events”)• Allows analysis of complex relationships and meaning

agent

context

resource

time place

Building views of “metadata”…

Page 21: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

An ontology approach uses the deeper view of metadata

entity attributeAttribute (static view)

relationshipentity entityRelationship

agent

context

resource

time place

Context (dynamic view)

Three levels of attribution, moving from simple (static) to richer (dynamic events):

Page 22: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Tested

• iDD has a long history and is used in several major activities.

• Built using methodology from the <indecs> framework • Used as the basis for DOI data model• Used as basis for the MPEG-21 Rights Data Dictionary (RDD)• Heavily influenced the current development of messaging

systems for the publishing industry (ONIX) and music industry (MI3P).

• Methodology has been validated against the W3C ontology language OWL-DL

• Methodology for constructing interoperable Data Dictionaries which underlies iDD is in use commercially (Ontologyx).

• The International DOI Foundation (IDF) and EDItEUR intend to harmonise ONIX and DOI metadata through the use of this common data dictionary – and welcome collaboration with others adopting a

similar approach

Page 23: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Neutral as to business model

• The semantic analysis underlying the iDD is independent of any implementation model.

• It was fundamental to indecs (despite “e-commerce” in its name) that it had no inherent commercial model, and it remains so for all the work that has followed it.

• It is just as critical to be able to say "this is not subject to copyright" as to say the opposite; – any "non-commercial" person or organization has is to be able

to state that something is freely available and under what circumstances.

• A broad ontology, supporting rights expressions, must be able to support any kind of expression of any kind of right, agreement or licence or any terms (or none).

• Most organizations have the need for both freedom and protection of intellectual property in different contexts. – The iDD is not solely a tool for intellectual property as

“commercial property” but is neutral as to the intellectual property regime being used.

Page 24: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Does not mandate one metadata scheme

• The aim of the iDD is to facilitate mapping between schemes

• The more precise the input, the more precise the output– e.g. a mapping from simple DC to SCORM will of necessity be “lossy”

• Some uses will set minimum standards – e.g. DOI Registration Agencies have rules that must be followed in the DOI application to

ensure that the metadata can be mapped into the iDD to declare Application Profiles

• Any user is otherwise free to use their own metadata schemes for gathering, storing or disseminating metadata. iDD facilitates input and output to others schemes = semantic interoperability

Page 25: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Provides authority

• Every term entered into the iDD carries information on its status as to origin and mapping agreement

• If reciprocally agreed, then can be an assured mapping – which will enable users of the dictionary to interpolate

mappings from their own schemes, through iDD, to scheme A and know that this will be considered authoritative by scheme A..

• Anyone contributing terms to the iDD can specify who is allowed to see or specify their own terms.

• Some terms will be accessible to all: – e.g. ONIX, some kernel DOI terms, and the MPEG21 RDD.

Page 26: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Construction

• Based on DD methodology and Contextual Ontologyx Architecture tools, terms from various sources (ONIX, RDD, DOI)

• …But users need not understand the underlying concepts and construction of the iDD.– It is no more a requirement to know the details than it is for

the designer of a web page to read all the underlying internet protocol RFCs.

• A fundamental role of the IDF and others with the iDD is to provide assurance to users that the work has been peer-reviewed and tested, and make available tools.

• Some key features are:– Extensible and granular to whatever level of detail is required. – Multiple, different, specialized views are available: these

include a Rights Model, based on a set of specialized Contexts. – Local terms: local (internal) data elements and names can be

added into the ontology– External terms: incorporates external and standard schemes

such ISO territory, currency and language codes, and sector specific external schemes

Page 27: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

Use

• Current use of the dictionary is on a project–by-project basis using technical consultancy

• An automated web based look-up system for the Dictionary is under development for IDF use (and potentially others e.g. RDD)

• Access will be granular: those with authority to access the Dictionary able to view what is appropriate – private terms are kept confidential.

Page 28: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

OntologyXRightsCom(Mi3p etc)

indecsDD

IDF + ONIX

Development of <indecs> 1998-2004 Black = what Red = who

indecs(2000)

EC plus many others: indecs Framework

IFPI/RIAA, MPA, IDF, DentsuMMG, Rightscom: methodology for DD

CONTECS(2001+)

2004

ISOMPEG21 RDD

IDF is authority

Data dictionaries

Page 29: The Data Dictionary Norman Paskin, International DOI Foundation ELECTRONIC COMMUNICATION OF LICENCE TERMS.

The <indecs> Data Dictionary

Norman Paskin, International DOI Foundation

[email protected]

ELECTRONIC COMMUNICATION OF LICENCE TERMS