Metadata: an introduction Michael Day UKOLN, University of Bath [email protected] Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001
Jan 14, 2016
Metadata: an introduction
Michael Day
UKOLN, University of [email protected]
Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001
Managing Networks, Birmingham, 13 September 2001
Presentation overview
• Defining “metadata”
• Dublin Core:
– Background
– Exercise 1
– Semantics
– Syntax
– Content Rules
– Exercise 2
Managing Networks, Birmingham, 13 September 2001
Metadata (1)
Some definitions:– “data about data”– “Internet-age term for structured data about
data” - Joint NSF-EU Working Group on Metadata (1998)
– “... Machine understandable information about web resources or other things” - Berners-Lee (W3C)
Functional definition:– structured data about resources that can be
used to help support a wide range of operations
Managing Networks, Birmingham, 13 September 2001
Metadata (2)
These operations may include:
• resource discovery and access
• rights management
• e-commerce
• authentication
• collection management
• preservation
Managing Networks, Birmingham, 13 September 2001
Metadata (3)
Resource discovery metadata:• Provides support for:
– searching– location – retrieval (delivery)– description
• May help enable:– Semantic interoperability
Managing Networks, Birmingham, 13 September 2001
Metadata (4)
Where is metadata stored?:• Different models of metadata-resource
association:– embedded within resource– tightly coupled using protocols or
identifiers– separate database(s)
Managing Networks, Birmingham, 13 September 2001
Metadata formats (1)
Diversity of metadata formats and frameworks
• How many have you heard of?
Managing Networks, Birmingham, 13 September 2001
Metadata formats (1)
Diversity of metadata formats and frameworks, e.g.:
• Dublin Core• EAD, CIMI, TEI • PICS, RDF• MARC• GILS, FGDC• ROADS
http://www.ukoln.ac.uk/metadata/glossary/
Managing Networks, Birmingham, 13 September 2001
Metadata formats (2)
SCHEMAS Forum project “Metadata Watch” has already identified:
• Over 200 implementation activities
• Around 90 standardisation activities
• Very different levels of information about the various initiatives
Managing Networks, Birmingham, 13 September 2001
Metadata formats (3)
USMARC:
245 00 Wordnews online $h [computer file].
246 3 World news online
256 Computer online service.
260 Washington, D.C. : $b Worldnews Online, $c [1995-
538 Mode of access: Internet.
500 Title from title frame.
520 “WorldNews OnLine is a service ... “
650 0 Newspapers $x Databases.
856 7 $u http://worldnews.net $2 http
Managing Networks, Birmingham, 13 September 2001
Metadata formats (4)
TEI header:
<teiHeader type="aacr2"><fileDesc><titleStmt>
<title type="245">Rubaiyat of Omar Khayyam : the astronomer poet of Persia / rendered into English verse by Edward Fitzgerald ; with drawings by Florence Lundborg</title>
<title type="gmd">[electronic resource]</title>
<author>Omar Khayyam</author>
<respStmt>
<resp>Conversion to TEI.2-conformant markup:</resp>
<name>University of Virginia Library Electronic Text Center </name>
</respStmt>
Managing Networks, Birmingham, 13 September 2001
Metadata formats (5)
ROADS/IAFA template:
Template-Type: SERVICE
Handle: 871473886-23884
Title: Wellcome Unit for the History of Medicine
URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html
Admin-Email-v1: [email protected]
Publisher-Name-v1: Wellcome Unit for the History of Medicine
Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE
Publisher-City-v1: Oxford
Managing Networks, Birmingham, 13 September 2001
A metadata typology
Simple Rich
Based on: Dempsey and Heery (1998)
Band One Band Two Band Three
(full textindexes)
(simplestructuredgenericformats)
(more complexstructure,domainspecific)
(part of largersemanticframework)
Proprietaryformats
ProprietaryformatsDublin CoreROADSIAFA/Whois++templates
FGDCMARC
TEI headersICPSREADCIMI
Managing Networks, Birmingham, 13 September 2001
Who creates metadata?
Resource creators• authors• webmasters• institutions
Service providers• search services• third parties• commercial publishers
• hand crafted
• robot/database generated
Managing Networks, Birmingham, 13 September 2001
Metadata creation tools
DC-dot:http://www.ukoln.ac.uk/metadata/dcdot/
Nordic Metadata Project Metadata Template:http://www.lub.lu.se/cgi-bin/nmdc.pl
Reggie Metadata Editor:
http://metadata.net/dstc/
Managing Networks, Birmingham, 13 September 2001
Aspects of metadata
Syntax• related to the technical implementation
- e.g. MARC, XML
Semantics• the basic meaning of elements
Rules for content• e.g., cataloguing rules
The Dublin Core Metadata Element Set
Managing Networks, Birmingham, 13 September 2001
Dublin Core (1)
What is it?• 15 element metadata set• based on international consensus• Some initial assumptions:
– simple set for untrained creators– basic set for semantic interoperability or
resource discovery– primarily for Web-based document-like
objects
http://www.dublincore.org/
Managing Networks, Birmingham, 13 September 2001
Dublin Core (2)
Dublin Core Metadata Initiative• Workshop series
– first workshop hosted by OCLC in Dublin, Ohio (1995)
– 9th workshop (DC2001) will be held in October (Tokyo)
• Working Groups– for DC issues (e.g. Architecture, Registry,
Standards, tools, etc.)– for specific user communities (e.g. Libraries,
Education, Government, etc.)– open e-mail discussion lists
Managing Networks, Birmingham, 13 September 2001
Dublin Core (3)
Dublin Core Metadata Element Set:• Version 1.0 (RFC 2413, 1998)• Version 1.1 (1999)
– approved (Z39.85) by the US National Information Standards Organization (NISO) as a Draft American National Standard (July 2001)
Dublin Core Qualifiers:• DCMI Recommendation (2000)
Managing Networks, Birmingham, 13 September 2001
DC exercise 1
The Dublin Core Metadata Element Set consists of 15 elements, designed for simple resource discovery.
What elements do you think should be part of such a metadata element set?
• Think about the type of resources that need to be described:
– Web pages
– Document-like objects
– Images, sound resources, etc.
– Multimedia resources
Dublin Core semantics
Managing Networks, Birmingham, 13 September 2001
DC semantics (1)
• Title • Subject • Description • Creator • Publisher • Contributor • Date • Type
• Format • Identifier • Source • Language • Relation• Coverage • Rights
15 element core metadata set:
Managing Networks, Birmingham, 13 September 2001
DC semantics (2)
An example:– Name: Description– Identifier: Description– Definition: An account of the content of the
resource.– Comment: Description may include but is
not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Managing Networks, Birmingham, 13 September 2001
DC semantics (3)
Qualifiers:• DC semantics are defined very broadly• Possible to add qualifiers to some
elements:– Element refinement(s):
– Relation.IsPartOf
– Date.Created
– Encoding scheme(s):– Subject (scheme=DDC)
– Date (scheme=ISO8601)
DC syntax
Managing Networks, Birmingham, 13 September 2001
DC syntax (1)
Can be embedded into HTML Web pages:• <META> tag• limited functionality• the data can be “harvested” by
metadata-aware search engines (but not many do this)
• note that this is just one way of implementing the DC element set
Managing Networks, Birmingham, 13 September 2001
DC syntax (2)An example of embedding DC metadata in HTML 4.0:
<html><head>
<title>UKOLN Home Page</title>
<meta name="DC.Title" content="UKOLN">
<meta name="DC.Description" content="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services">
<meta name="DC.Creator" content="UKOLN Information Services Group">
</head>
DC content rules
Managing Networks, Birmingham, 13 September 2001
DC content rules
Not part of DCMI:• No content rules (cataloguing rules)
defined as part of Dublin Core Metadata Element Set
May be important where there are expectations of consistent cross-searching across related services, e.g.:
• ROADS Cataloguing Guidelines• Resource Discovery Network (RDN)
Cataloguing Guidelines
Managing Networks, Birmingham, 13 September 2001
DC exercise 2
Go to the Nordic Metadata Template at:
http://www.lub.lu.se/cgi-bin/nmdc.pl
And try to create some metadata for a Web page that you know reasonably well
• Reflect on:– Which bits are difficult to fill in
– Which parts relate to semantics, which to content rules (e.g. inverted forms of names)
Managing Networks, Birmingham, 13 September 2001
Acknowledgements
UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
http://www.ukoln.ac.uk/