Top Banner
Metadata matters : describing a TEI text, its contents, its meanings Magdalena Turska, James Cummings and many more July 17th 2014 1/60
60

Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Metadata matters : describing a TEI text,its contents, its meanings

Magdalena Turska, James Cummings and many more

July 17th 2014

1/60

Page 2: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

What is Metadata?

Manchester Library Card Catalogue by ricardo CC-BY 2.0

often called "data about data"term originally used with electronic data but itsmeaning has broadeneddata about the content, context, and structure ofinformation resourcesmuch like an electronic version of a title pageattached to a printed work2/60

Page 3: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General purposes of metadata

Schlagwortkatalog by Dr. Marcus Gossler CC BY-SA 3.0

supports the identification, retrieval, use, re-use,management, and preservation of informationresourcesenriches the informational value of an objectcan describe a collection, a single resource, or acomponent part of a larger resource3/60

Page 4: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Who is it for?

Copyright Terry Pratchett and Paul Kidby

Librarians how to identify and describe "electronicbooks"

Users and text analysts what are "coding practices"within digital resources

4/60

Page 5: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Main purposes of the Librarian vs UserLibrarians

identify the resource in definitive manner

document its components, media and organization

declare its legal properties (copyright etc.)

Users and text analysts

summarize its logical structure

specify the intended and possible uses

describe the analytical scheme ("codebook") if itexists

summarize its properties and content for the use ofsearch engines

This creates lot of tension and pull in separate directions

5/60

Page 6: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

The Librarian's Header

Conforms to standard bibliographic model, usingsimilar terminology

Organized as a single source of information forbibliographic description of a digital resource, withestablished mappings to other such records (e.g.MARC, EAD, etc.)

Pressure for greater and more exact constraints toimprove precision of description: preference forstructured data over loose prose

Librarians want you to adhere to standards and fit intoexisting databases and search systems. Ooook!

6/60

Page 7: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Everyman's HeaderUsers seek detailed information on possibly every aspectof your source and methodology

Gives a polite nod to common bibliographic practice,but has a far wider scope

Supports a (potentially) huge range of verymiscellaneous information, organized in fairly ad hocor individualistic ways

Many different codes of practice in different usercommunities

Unpredictable combinations of narrowly encodeddocumentation systems and loose prose descriptions

Users seek detailed information on possibly every aspectof your source and methodology

7/60

Page 8: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Metadata standards

DCMI Dublin Core Metadata Initiative

RDF Resource Description Framework

EAD Encoded Archival Description

METS Metadata Encoding and TranscriptionStandard

OAIS Open Archival Information

OAI-PMH Open Archives Initiative-Protocol for MetadataHarvesting

Z39-50 ANSI standard protocol (ISO 23950)

TEI provides a richer vocabulary than EAD or DCMI, and isless abstract than RDF or METS

8/60

Page 9: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Where the TEI stands

The TEI header was designed with both perspectives inmind

TEI requires metadata to be stored inside the XMLdocument, pre� xed to the content. This informationcomprises the TEI header although, as we will see, somecan be included inside the <body>.

TEI header <teiHeader> supplies the descriptive anddeclarative information making up an electronic title pageprefixed to every TEI-conformant text.

9/60

Page 10: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Types of content in the TEI headerfree prose

prose description: series of paragraphsphrase: character data, interspersed with phrase-levelelements, but not paragraphs

grouping elements: specialised elements recordingsome structured information

declarations: Elements whose names end with thesuffix Decl (e.g. subjectDecl, refsDecl) encloseinformation about specific encoding practices appliedin the electronic text.

descriptions: Elements whose names end with thesuffix Desc (e.g. <settingDesc>, <projectDesc>)contain a prose description, possibly, but notnecessarily, organised under some specific headingsby suggested sub-elements.

10/60

Page 11: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

TEI Header StructureThe TEI header has four main components:

<fileDesc> (file description) contains a fullbibliographic description of a computer file.N.B. A "computer file" may actually correspond withseveral files across different operating system.<encodingDesc> (encoding description) documentsthe relationship between an electronic text and thesource or sources from which it was derived.<profileDesc> (text-profile description) provides adetailed description of non-bibliographic aspects of atext, specifically the languages and sublanguagesused, the situation in which it was produced, theparticipants and their setting. (just about everythingnot covered in the other header elements<revisionDesc> (revision description) summarizes therevision history for a file.

Only <fileDesc> is required; the others are optional.11/60

Page 12: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

12/60

Page 13: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Example Header: Minimal required header

<teiHeader><fileDesc><titleStmt><title>A title?</title>

</titleStmt><publicationStmt><p>Who published?</p>

</publicationStmt><sourceDesc><p>Where from?</p>

</sourceDesc></fileDesc>

</teiHeader>

13/60

Page 14: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Example Source

Bodleian Library, University of Oxford: John JohnsonCollection: Cinemas 1 (43a)

14/60

Page 15: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Example Header: Minimal required header

<teiHeader><fileDesc><titleStmt><title>'Through Romantic India' [a machine-readable

transcription]</title></titleStmt><publicationStmt><p>John Johnson Collection of Printed Ephemera: Now and Then blog</p>

</publicationStmt><sourceDesc><p>Transcribed from the item with shelfmark Cinemas 1 (43a) located in

the Bodleian Library, John Johnson Collection of Printed Ephemera</p></sourceDesc>

</fileDesc></teiHeader>

15/60

Page 16: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

The TEI supports two ‘levels’ or types of header

corpus level metadata sets default properties foreverything in a corpus

text level metadata sets specific properties for onecomponent text of a corpus

16/60

Page 17: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Corpus Header Example

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0"><!-- Add xmlns and version in <teiCorpus> --><teiHeader type="corpus">

<!-- corpus-level metadata here --><!-- Must contain one TEI header for the corpus. --></teiHeader><TEI><teiHeader type="text">

<!-- metadata specific to this text here --><!-- Must contain a series of TEI elements, one for each text. -->

</teiHeader><text>

<!-- ... --></text>

</TEI><TEI><teiHeader type="text">

<!-- metadata specific to this text here --></teiHeader><text>

<!-- ... --></text>

</TEI></teiCorpus>

17/60

Page 18: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

File Description <fileDesc>

has some mandatory elements:<titleStmt>: provides a title for the resource and anyassociated statements of responsibility<sourceDesc>: documents the sources from whichthe encoded text derives (if any)<publicationStmt>: documents how the encoded textis published or distributed

and some optional ones:<editionStmt>: yes, digital texts have editions too<seriesStmt>: and they also fit into "series".<extent>: how many floppy disks, gigabytes, files?<notesStmt>: notes of various types

18/60

Page 19: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

File Description <fileDesc> (cont.)

<titleStmt>: contains a mandatory <title> whichidentifies the electronic file (not its source!)

optionally followed by additional titles, and by"statements of responsibility", as appropriate, using<author>, <editor>, <sponsor>, <funder>,<principal> or the generic <respStmt><publicationStmt>: may contain

plain text (e.g. to say the text is unpublished)one or more <publisher>, <distributor>, <authority>,each followed by <pubPlace>, <address>,<availability>, <idno>

19/60

Page 20: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Title and Responsibility StatementsWithin <titleStmt>, you can repeat any of these elementsas necessary, and document additional responsbilitieswith a generic <respStmt>:

<titleStmt><title>Letter to Leslie Gunston</title><author>Wilfred Owen</author><editor>Renée van Baalen</editor><principal>James Cummings</principal><meeting>Digital Humanities at Oxford Summer School</meeting><respStmt><resp>Improved encoding</resp><name>James Cummings</name>

</respStmt></titleStmt>

N.B. The title of the electronic work should be derivedfrom the source text, but clearly distinguishable fromit.At a minimum, identify the author of the text and(where appropriate) the creator of the file or corpus

20/60

Page 21: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Edition and Extent statements...<editionStmt>

can be used to document the details of this particularedition (e.g. date)

optional for the first release, but is mandatory foreach later release

<extent>

approximate size of a text stored on some carriermedium or of some other object, digital or non-digital

is sometimes used to document number of words ina corpus

<editionStmt><edition>First Edition</edition>

</editionStmt><extent>6.5 kb</extent>

21/60

Page 22: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

22/60

Page 23: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Publication Statement example

<publicationStmt><publisher>TEI @ Oxford</publisher><distributor>Digital Humanities at Oxford Summer School</distributor><authority>James Cummings</authority><pubPlace><address><orgName>IT Services</orgName><street>13 Banbury Road</street><settlement>Oxford</settlement><postCode>OX2 6NN</postCode><country>United Kingdom</country>

</address></pubPlace><date when="2013-07-09">09 July 2013</date><idno>dhoxss-tei-talk03</idno><availability><licence>Licensed with a <ref target="http://creativecommons.org/licenses/by/3.0/">Creative

Commons Attribution</ref> licence.</licence></availability>

</publicationStmt>

23/60

Page 24: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<publicationStmt> notes

mandatory element

<publisher>, <distributor> and/or <authority> mustbe present unless the entire publication statement isgiven as prose

If the creation date is different than the date ofpublication, creation date should be given within<profileDesc>, not in the <publicationStmt>

formal license may be entered in <licence> includedin <availability>

24/60

Page 25: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

25/60

Page 26: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Series statement

These include

separate items that share a collective title applicableto the group

two or more volumes of items, similar in characterand issued in sequence

separately numbered sequence of volumes within aserial or serials

26/60

Page 27: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Series statement example

<seriesStmt><title level="s">Machine-Readable Texts for the Study of Indian Literature</title><respStmt><resp>ed. by</resp><name>Jan Gonda</name>

</respStmt><biblScope unit="vol">1.2</biblScope><idno type="ISSN">0 345 6789</idno>

</seriesStmt>

27/60

Page 28: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

28/60

Page 29: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Notes statement

The optional <notesStmt> can contain notes on almostany aspect of the file or its contents:

<notesStmt><note>Transcribed for

a TEI Workshop</note></notesStmt>

These notes can be short statements, or manyparargaphs long. Take care to encode such informationwith more precise elements elsewhere in the TEI header,when such elements are available. For example, texttypes, such as reportage or detective stories, should bedescribed under <profileDesc>

29/60

Page 30: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

30/60

Page 31: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

The Source Description statementAll electronic works need to document their source, even'born digital' ones! There are variety of elements you maydraw from:

prose description, just a <p><bibl> (bibliographic citation): contains free textand/or any mixture of bibliographic elements such as<author>, <publisher> etc.<biblStruct> (structured) contains similar elementsbut constrained in various ways according tobibliographic standards<biblFull> (fully-structured) special-cases texts whichwere born TEI by replicating an embedded <fileDesc>A <listBibl> may be used for lists of suchdescriptions, e.g. bibliographiesSpecialised elements for spoken texts(<recordingStmt> etc.) and for manuscripts(<msDesc>) Discussed later!Authority lists: <listPerson>, <listPlace>, <listOrg>

31/60

Page 32: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Example Source

32/60

Page 33: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<sourceDesc> example

<sourceDesc><biblStruct><analytic><title>Letter to Leslie Gunston</title><author>Wilfred Owen</author>

</analytic><monogr><title>The Wilfred Owen Collection</title><ref target="http://www.oucs.ox.ac.uk/ww1lit/collections/document/5243/4769"> First World

War Poetry Digital Archive</ref><imprint><publisher>The First World War Poetry Digital Archive</publisher><pubPlace>Oxford</pubPlace><biblScope type="pp" n="2">Two pages</biblScope>

</imprint></monogr><relatedItem><bibl>The source of this digital resource is a copy from the <distributor>Harry

Ransom Centre</distributor>.</bibl></relatedItem>

</biblStruct></sourceDesc>

33/60

Page 34: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Description of the sources

Most digitized texts have not been created in digital form... it is necessary to describe their sources

TEI provides a wide range of bibliographic elements bothstructured or not:

<bibl>, <biblStruct>

(for a text already computerized) : <biblFull> (samecontent as <fileDesc>)

<listBibl> a list of items above

prose description

and more specialized items for transcripts of speechor manuscripts.

34/60

Page 35: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Classic source (1)

<sourceDesc><biblStruct xml:lang="fr"><monogr><author>Henryk Sienkiewicz</author><title>Quo Vadis</title><title type="sub">Powiesc z czasów

Nerona</title><imprint><pubPlace>Warszawa</pubPlace><publisher>Gebethner i Wolff</publisher><date>1896</date>

</imprint></monogr>

</biblStruct></sourceDesc>

35/60

Page 36: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Classic source (2)

<bibl type="book" subtype="monograph"xml:id="brief_discours_1614"><title level="m">Brief Discours pour la reformation des mariages</title>.

<pubPlace>Paris</pubPlace>, de l’imprimerie d’<publisher>Anthoine duBrueil</publisher>, rue Saint-Jacques, au dessus de Saint-Benoist, à la Couronne,

<date when="1614">1614</date>, <biblScope type="pp">pp 3-16</biblScope> dans <title level="m">VariétésHistoriques et Littéraires. Recueil de pièces volantes rares et

curieuses en prose et en vers</title>, Revues et annotés par M.<editor>

<name><forename>Édouard</forename><surname>Fournier</surname>

</name></editor>, <biblScope type="vol">Tome

IV</biblScope>. A <pubPlace>Paris</pubPlace>, Chez <publisher>P. Jannet</publisher>.<date when="1856">MDCCCLVI</date>.</bibl>

36/60

Page 37: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Speech

<sourceDesc><recordingStmt>

<recording type="audio" dur="P30M"><respStmt>

<resp>Location recording by</resp><name>Sound Services Ltd.</name>

</respStmt><equipment>

<p>Multiple close microphones mixed down to stereo Digital Audio Tape, standardplay, 44.1 KHz sampling frequency</p>

</equipment><date>12 Jan 1987</date>

</recording></recordingStmt>

</sourceDesc>

<sourceDesc><recordingStmt>

<recording type="video" when="1989-06-24"dur="P60M"><p>

<title>24 Heures</title>: émission télévisée <date>24 juin 1989</date></p>

</recording></recordingStmt>

</sourceDesc>

37/60

Page 38: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Born digital source

<sourceDesc><bibl><title>Manifeste des Digital humanities</title><author>Marin Dacos</author><ref target="http://tcp.hypotheses.org/318">

http://tcp.hypotheses.org/318</ref><date when="2010-05-21"/>

</bibl></sourceDesc>

<sourceDesc><p>Aucune source: ce document est né numérique</p>

</sourceDesc>

38/60

Page 39: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Manuscript source

<sourceDesc><msDesc>

<msIdentifier><country>Poland</country><settlement>Kraków</settlement><repository>Biblioteka Czartoryskich</repository><idno>1594</idno>

</msIdentifier><msContents>

<p>Fair copy of the letter of Ioannes Dantiscus toJan Balinski</p>

</msContents><physDesc>

<p>One page of paper in good condition</p><handDesc>

<handNote xml:id="ID" scope="major">IoannesDantiscus</handNote>

</handDesc></physDesc>

</msDesc></sourceDesc>

39/60

Page 40: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Association between header and text

By default everything asserted by a header is true of thetext to which it is prefixed. This can be over-ridden:

as when a text header over-rides or amplifies acorpus-header setting

when model.declarable elements are selected bymeans of the @decls attribute (available on allmodel.declaring elements)

using special purpose selection/definition elementse.g. <catRef> and <taxonomy>

Most components of the encoding description aredeclarable.

40/60

Page 41: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

41/60

Page 42: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Encoding Description<encodingDesc> groups notes about the procedures usedwhen the text was encoded, either summarised in proseor within specific elements such as

<projectDesc>: goals of the project

<samplingDecl>: sampling principles

<editorialDecl>: editorial principals, e.g.<correction>, <normalization>, <quotation>,<hyphenation>, <segmentation>, <interpretation>

<classDecl>: classification system/s used

<tagsDecl>: specifics about usage of particularelements

Detailed notes in <encodingDesc> could be used togenerate section of an editorial description.

42/60

Page 43: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<encodingDesc> Example (1)

<encodingDesc><projectDesc><p>The TEI@Oxford project created teaching materials for DHOXSS</p>

</projectDesc><editorialDecl><correction><p>Apparent errors have been marked as <gi>sic</gi> but corrected readings not

provided</p></correction><hyphenation><p>Hyphens have been transcribed as they appear.</p>

</hyphenation></editorialDecl>

</encodingDesc>

43/60

Page 44: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<encodingDesc> Example (2)

<encodingDesc><classDecl><taxonomy xml:id="part-of-speech"><category xml:id="adje"><catDesc>adjectives</catDesc><category xml:id="AJ0"><catDesc>adjective (unmarked) (e.g. GOOD, OLD)</catDesc>

</category><category xml:id="AJC"><catDesc>comparative adjective (e.g. BETTER, OLDER)</catDesc>

</category><category xml:id="AJS"><catDesc>superlative adjective (e.g. BEST, OLDEST)</catDesc>

</category></category><category xml:id="AT0"><catDesc>article (e.g. THE, A, AN)</catDesc>

</category><!-- ... -->

</taxonomy></classDecl>

</encodingDesc>

<w ana="#AJ0">brilliant</w>

44/60

Page 45: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

The tagging declaration

Records elements namespace, tag frequency, informationabout the usage of particular tags not specifiedelsewhere, and default text appearance in source.

45/60

Page 46: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Rendition

<rendition>: structured information aboutappearance in the source document

rendered using informal prose description, standardstylesheet language (CSS, XSL-FO), or project-definedlanguage.

46/60

Page 47: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<rendition> element

<tagsDecl><rendition xml:id="r-center" scheme="css">text-align:

center;</rendition><rendition xml:id="r-small" scheme="css">font-size: small;</rendition><rendition xml:id="r-large" scheme="css">font-size:

large;</rendition></tagsDecl>

which you can easily point to from the text:

<hi rendition="#r-center #r-large">thisbit of text was large and centred</hi>

but compare:

<hi rend="large center">this bit of text waslarge and centred</hi>

47/60

Page 48: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<appInfo> element

<appInfo>: structured information about an applicationwhich has edited this TEI file

<appInfo><application version="1.8.2.2"ident="ImageMarkupTool" notAfter="2012-06-01"><label>Image Markup

Tool</label><ptr target="#P1"/><ptr target="#P2"/>

</application></appInfo>

48/60

Page 49: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

49/60

Page 50: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Profile DescriptionA collection of descriptions, categorised only as"non-bibliographic" . Default members of themodel.profileDescPart class include:

<creation>: information about the origination of theintellectual content of the text, e.g. time and place

<langUsage>: information about languages,registers, writing systems etc used in the text

<textDesc> and <textClass>: classifications appliedto the text by means of a list of specified criteria orby means of a collection of pointers, respectively

<particDesc> and <settingDesc>: information aboutthe participants, either real or depicted, in the text

<handNotes>: information about the particular styleor hand distinguished within a manuscript

50/60

Page 51: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<creation> example

<creation><date when="1918-05"/><placeName>Ripon</placeName><listChange ordered="true"><change xml:id="CHG-1">First stage, written in pencil in Owen's hand </change><change xml:id="CHG-2">Second stage, revised in pencil in Owen's hand</change><change xml:id="CHG-3">Fixation of the revised passages and further minor revisions by

Owen using ink</change><change xml:id="CHG-4">Addition of another stanza with a

different ink, probably at a later stage</change></listChange>

</creation>

Here <listChange> records stages in changes to thedocument. Further down, in <revisionDesc> the sameelement is used to record changes to the electronic file.

51/60

Page 52: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Language and character set usage

The <langUsage> element is provided to document usageof languages and writing systems in the text. Languagesare identified by their ISO codes:

<langUsage><language ident="en">English</language><language ident="fr">French</language><language ident="bg-cy">Bulgarian in Cyrillic characters </language><language ident="bg">Romanized

Bulgarian</language></langUsage>

52/60

Page 53: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Classification Methods

<textClass> groups information which describes thenature or topic of a text in terms of a standardclassification scheme, thesaurus, etc. using one or moreof the following ways:

using <catRef> direct reference to a locally defined (e.g.in the corpus header) category

using <classCode> reference to some standard andexternally defined classification scheme

using <keywords> assign arbitrary descriptive termstaken from a bibliographic controlledvocabulary or a tag cloud

53/60

Page 54: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

BNC Example

<profileDesc><creation><date when="1962"/>

</creation><classCode scheme="DLEE">W nonAc: humanities arts</classCode><keywords scheme="COPAC"><term>History, Modern - 19th century</term><term>Capitalism - History - 19th century</term><term>World, 1848-1875</term>

</keywords></profileDesc>

This categorization applies to the whole text. For morefine grained classification, use @decls on e.g. a <div>element to point to applicable variation in header.

54/60

Page 55: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Detailed characterization of a text<textDesc> provides a description of a text in terms of itsSituational parameters, a description of the situationwhithin which the text was produced or experienced.

<textDesc n="novel"><channel mode="w">print; part issues</channel><constitution type="single"/><derivation type="original"/><domain type="art"/><factuality type="fiction"/><interaction type="none"/><preparedness type="prepared"/><purpose type="entertain" degree="high"/><purpose type="inform" degree="medium"/>

</textDesc>

These subelements constitute the classmodel.textDescPart: you could modify that for otherparameters.

55/60

Page 56: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<particDesc> example (1)

<particDesc xml:id="p2"><p>Female

informant, well-educated, born in Shropshire UK, 12 Jan 1950, ofunknown occupation.

Speaks French fluently. Socio-Economic status B2 in the PEPclassification

scheme.</p></particDesc>

<particDesc> can just contain paragraphs of prose, or amore structured <person> element in <listPerson>

56/60

Page 57: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

<particDesc> example (2)

<particDesc><listPerson><person xml:id="SL"><persName>Stuart Lee</persName><note><ref target="http://users.ox.ac.uk/~stuart/Site/About_Me.html"> Stuart Lee's

home page</ref></note>

<!-- We could give more details about Stuart here --></person><person xml:id="SR"><persName>Sebastian Rahtz</persName><note><ref target="http://uk.linkedin.com/pub/sebastian-rahtz/a/937/208"> Sebastian Rahtz's

entry in LinkedIN</ref></note>

</person></listPerson>

</particDesc>

57/60

Page 58: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

General <teiHeader> structure

<teiHeader>  <fileDesc>    <titleStmt> <!-- title information here --> </titleStmt>    <editionStmt><!-- edition information here --></editionStmt>    <publicationStmt><!-- publication information here --> </publicationStmt>    <seriesStmt><!-- series related information here --></seriesStmt>    <notesStmt><!-- any related notes here --></notesStmt>    <sourceDesc><!-- source description here --> </sourceDesc>  </fileDesc>  <encodingDesc> <!-- encoding description here --></encodingDesc>  <profileDesc> <!-- profile description here --></profileDesc>  <revisionDesc><!-- revision description here --></revisionDesc></teiHeader>        

58/60

Page 59: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Revision Description

A list of <change> elements, each with a @date and@who attributes, indicating significant stages in theevolution of a document. Most recent first.

Can be grouped into <listChange> elements. Usedhere it is about the electronic file, used in <creation>it is about the document.

Can be maintained manually, or done by means of aversion control system (like SVN)

<revisionDesc><listChange><change when="2012-07-03"><persName>James Cummings</persName> improved the

header.</change><change when="2012-02"><persName>Renée van Baalen</persName> transcribed the

<title>Letter to Leslie Gunston</title> document. </change></listChange>

</revisionDesc>

59/60

Page 60: Metadata matters :describing a TEItext, its contents,its ... › wp-content › uploads › 2015 › ...Metadata matters :describing a TEItext, its contents,its meanings Magdalena

Next

Any Questions? Next, the timetable says we're doing anexercise!

Guidelines for future reference

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html

60/60