Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.

Post on 20-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Metadata

Metadata Mark-up and Management

© Adolf Knoll, National Library of the Czech Republic

Metadata

It is added value to digital files for which it forms a container to identify them to enable easier access and navigation to control the entire compound document to enable archival storage to enable research work and publication of even

critical editions, etc.

Compound Document

The document consisting of interconnected metadata and data files the metadata are added descriptions (mostly

pieces of text) the data are any external files produced by

digitizing pieces of original documents (images, texts, sound files, even video files)

What is described?

OBJECTS- of which the document consists and which

build the document

- which have their unchanging substance

- whose representations can vary in their different occurrences

- which can have some important additional characteristics

Object OISEAU

BIRDPTÁK

VOGEL

CockKohoutHahn

EagleOrelAdler

PenguinTučňákPinguin

FalconSokolFalke

DuckKachnaEnte

Objects

They are defined by the creator or interpreter of the document

They can be built from any sequence or amount of bits in metadata or data areas

It should be established: which types of objects must be distinguished how they should be marked

Object OISEAU

We have decided to have such an object (animal with wings, feathers, laying eggs)

We have decided to mark anything having these characteristics as OISEAU

We know that this object has different names in different languages (bird, pták, Vogel, птица, pasăre, …)

We know that in reality only concrete birds appear (duck, cock, falcon, penguin, eagle, …)

Objects and contents

Semantically poor content

• formal object (paragraph, heading, note, …)

• used for formatting• languages built on

these objects are used for output (HTML, MS WORD, …)

• PRESCRIPTIVE MARK-UP

Semantically rich content

• content oriented object (author, flower, house, …)

• used for understanding• languages built on

these objects are used for description (MARC, TEI, EAD, DOBM, …)

• DESCRIPTIVE MARK-UP

SGMLStandard Generalized Markup Language

• a general language to mark objects• to be applied, it needs to become more concrete

(this is made via DTD)• thus, second level applications can be written• these applications are used directly or they require

additional definitions (DTDs)• SGML applications: HTML, XML, TEI, …

DESIGNING OUR PROJECT

What do we need?

Open communication Internal precision and cohesion of markup Multiple output, reuse of marked data, liberty

to add new marked data Complex document control and

management Open and flexible content-oriented

description principle

What do we work with?

For a manuscript having 300 pages, we work with: more than 1500 digital data files produced through

digitization (Gallery, Preview, Internet, User, Excellent quality levels: 300x5 + images for covers, end-sheets, ...)

more than 300 description metadata files (each digitized piece of the original + files for bibliographic and technical descriptions + technological files)

This means that the above mentioned requirements must be applied to a complex document consisting of hundreds of computer files, which play various roles.

Independency

Metadata should be independent of display – pure values

We must know: which features of objects to describe – we need

DESCRIPTION RULES how to mark up these objects – we need RULES for

MARK-UP how to formalize which objects and how will be described

– DTD how to display the compound document – we need rules

for display (transformation rules) If the platform is SGML or XML, we write DTD and

XSL tools.

type of document; place

place of publishing; publisher; date; addressee

description elements

author type of document: postcard place: Hronov place of publishing: Hronov publisher: Karel Šefelín date: 1914 addressee: František Bittnar annotation: Streets of Hronov in 1914; postcard written by my

great-grandmother to her husband making military service

However, maybe there are better rules, e.g. AACR2 defining how to describe a postcard – we should take them or some approach largely applied than this proposal of ours.

how to mark the elements?

In DTD: <!ELEMENT PlaceOfPublication

(#PCDATA)>

In Metadata File: <PlaceOfPublication>Hronov</

PlaceOfPublication>

write

<Author></Author> <TypeOfDocument>postcard</TypeOfDocument> <Place>Hronov</Place> <PlaceOfPublication>Hronov</PlaceOfPublication> <Publisher>Karel Šefelín</Publisher> <Date>1914<Date> <Addressee>František Bittnar<Addressee> <Annotation>Streets of Hronov in 1914; postcard

written by my great-grandmother to her husband making military service</Annotation>

publish

XSL transformation of the XML files … in order to display them

Index by a database tool and provide even a better access

Link metadata with image data

This is work for professionals

tools

Simple browsing Internet access tools

top related