Metadata Metadata Mark-up and Management olf Knoll, National Library of the Czech Republic
Jan 20, 2016
Metadata
Metadata Mark-up and Management
© Adolf Knoll, National Library of the Czech Republic
Metadata
It is added value to digital files for which it forms a container to identify them to enable easier access and navigation to control the entire compound document to enable archival storage to enable research work and publication of even
critical editions, etc.
Compound Document
The document consisting of interconnected metadata and data files the metadata are added descriptions (mostly
pieces of text) the data are any external files produced by
digitizing pieces of original documents (images, texts, sound files, even video files)
What is described?
OBJECTS- of which the document consists and which
build the document
- which have their unchanging substance
- whose representations can vary in their different occurrences
- which can have some important additional characteristics
Object OISEAU
BIRDPTÁK
VOGEL
CockKohoutHahn
EagleOrelAdler
PenguinTučňákPinguin
FalconSokolFalke
DuckKachnaEnte
Objects
They are defined by the creator or interpreter of the document
They can be built from any sequence or amount of bits in metadata or data areas
It should be established: which types of objects must be distinguished how they should be marked
Object OISEAU
We have decided to have such an object (animal with wings, feathers, laying eggs)
We have decided to mark anything having these characteristics as OISEAU
We know that this object has different names in different languages (bird, pták, Vogel, птица, pasăre, …)
We know that in reality only concrete birds appear (duck, cock, falcon, penguin, eagle, …)
Objects and contents
Semantically poor content
• formal object (paragraph, heading, note, …)
• used for formatting• languages built on
these objects are used for output (HTML, MS WORD, …)
• PRESCRIPTIVE MARK-UP
Semantically rich content
• content oriented object (author, flower, house, …)
• used for understanding• languages built on
these objects are used for description (MARC, TEI, EAD, DOBM, …)
• DESCRIPTIVE MARK-UP
SGMLStandard Generalized Markup Language
• a general language to mark objects• to be applied, it needs to become more concrete
(this is made via DTD)• thus, second level applications can be written• these applications are used directly or they require
additional definitions (DTDs)• SGML applications: HTML, XML, TEI, …
DESIGNING OUR PROJECT
What do we need?
Open communication Internal precision and cohesion of markup Multiple output, reuse of marked data, liberty
to add new marked data Complex document control and
management Open and flexible content-oriented
description principle
What do we work with?
For a manuscript having 300 pages, we work with: more than 1500 digital data files produced through
digitization (Gallery, Preview, Internet, User, Excellent quality levels: 300x5 + images for covers, end-sheets, ...)
more than 300 description metadata files (each digitized piece of the original + files for bibliographic and technical descriptions + technological files)
This means that the above mentioned requirements must be applied to a complex document consisting of hundreds of computer files, which play various roles.
Independency
Metadata should be independent of display – pure values
We must know: which features of objects to describe – we need
DESCRIPTION RULES how to mark up these objects – we need RULES for
MARK-UP how to formalize which objects and how will be described
– DTD how to display the compound document – we need rules
for display (transformation rules) If the platform is SGML or XML, we write DTD and
XSL tools.
type of document; place
place of publishing; publisher; date; addressee
description elements
author type of document: postcard place: Hronov place of publishing: Hronov publisher: Karel Šefelín date: 1914 addressee: František Bittnar annotation: Streets of Hronov in 1914; postcard written by my
great-grandmother to her husband making military service
However, maybe there are better rules, e.g. AACR2 defining how to describe a postcard – we should take them or some approach largely applied than this proposal of ours.
how to mark the elements?
In DTD: <!ELEMENT PlaceOfPublication
(#PCDATA)>
In Metadata File: <PlaceOfPublication>Hronov</
PlaceOfPublication>
write
<Author></Author> <TypeOfDocument>postcard</TypeOfDocument> <Place>Hronov</Place> <PlaceOfPublication>Hronov</PlaceOfPublication> <Publisher>Karel Šefelín</Publisher> <Date>1914<Date> <Addressee>František Bittnar<Addressee> <Annotation>Streets of Hronov in 1914; postcard
written by my great-grandmother to her husband making military service</Annotation>
publish
XSL transformation of the XML files … in order to display them
Index by a database tool and provide even a better access
Link metadata with image data
This is work for professionals
tools
Simple browsing Internet access tools