Top Banner
More Text Encoding Initiative (TEI) 6/30 XML + XSLT for Libraries
29

More Text Encoding Initiative (TEI)

Jan 19, 2016

Download

Documents

Arella

More Text Encoding Initiative (TEI). 6/30 XML + XSLT for Libraries. Today. Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: More Text Encoding Initiative (TEI)

More Text Encoding Initiative (TEI)

6/30

XML + XSLT for Libraries

Page 2: More Text Encoding Initiative (TEI)

Today

• Basic anatomy of TEI• Capturing the structure of source

documents• Capturing more than the structure• Building personographies• Using TEICorpus• In class

continue Assignment 5: Mark up digital texts in TEI

Page 3: More Text Encoding Initiative (TEI)

Basic anatomy of TEI

• <TEI> is the root element

• <teiHeader> - where the metadata about the digital document you are creating goes– this element is similar to <eadheader> in EAD

• <text> - where the transcription of the source document is captured

Page 4: More Text Encoding Initiative (TEI)

Required elements of <teiHeader>

• <fileDesc> - a wrapper element for capturing these required elements:

<titleStmt> - title of your TEI document (not the original document you are transcribing)

<publicationStmt> - for publication information about your TEI document

<sourceDesc> - for describing the original document you are transcribing

Page 5: More Text Encoding Initiative (TEI)

<teiHeader> examples

• While there are several required elements inside <teiHeader>, the structure of these elements is pretty flexible– A less structured example that uses <p> tags:

http://slis.uiowa.edu/~jlee/239/sampledocs/sampleTEIbook.xml

– A more structured example that uses more detailed tags such as <msIdentifier>: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml

Page 6: More Text Encoding Initiative (TEI)

Capturing the structure of your source document

Page 7: More Text Encoding Initiative (TEI)

Determining the level of your markup

• We will be transforming our TEI documents to web display as HTML.

• The more structure you capture in your transcription, the more flexible your display options will be later.

Page 8: More Text Encoding Initiative (TEI)

The <text> element

• <text> contains a single text of any kind

• You decide the scope of the <text> element– A poem?– A play?– An essay?– A collection of essays?

Page 9: More Text Encoding Initiative (TEI)

The <div> element

• Within <text>, <div> is used to describe some discrete structure of the source document

• You decide what <div> should represent:– One poem? One stanza of a poem? – One book? One chapter?

Page 10: More Text Encoding Initiative (TEI)

Sample <div> structure

• In this example,<div> represents one chapter:<text><body> <div> <head type="chapter">Chapter 1</head> <p>In this chapter, we will focus on….</p> </div> <div> <head type="chapter">Chapter 2</head> <p>In chapter one, you learned….</p> </div></body></text>

Page 11: More Text Encoding Initiative (TEI)

The <group> element• For more complex source documents, use <group> tags to capture

a series of <text> elements• For example, encoding a book of poems and using <text> for each

poem and <div> to capture stanzas– <text>

  <front> <!-- biographical notice by editor -->  </front>  <group>    <text> <!-- first poem -->    </text>    <text> <!-- second poem -->    </text>  </group></text>

Page 12: More Text Encoding Initiative (TEI)

The <ab> element

• The anonymous block element, <ab>, is used to encode a discrete chunk of text

• It is generally used to describe paragraph-like elements, like <p> tags in HTML

Page 13: More Text Encoding Initiative (TEI)

Encoding line breaks

• To retain original breaks in texts:

– encode them with line break <lb/> elements within anonymous block <ab> elements

<ab>Line one of text <lb/> Line two of text</ab>

– encode them with separate <ab> elements<ab>This is the first paragraph…</ab>

<ab>This is the second paragraph…</ab>

Page 14: More Text Encoding Initiative (TEI)

Encoding more than the structure of your source document…

Page 15: More Text Encoding Initiative (TEI)

Capturing images

• To include an image of the source document, use the <facsimile> element before <text> element:

<facsimile>

<graphic url="http://digital.lib.uiowa.edu/u?/noble,1184"/>

</facsimile>

*The URL points to a publicly accessible image file

Page 16: More Text Encoding Initiative (TEI)

Identifying names

Use <name>, <orgName>, or <persName> element anywhere within the transcription

<div> <p>As I haven't time to write a letter I will just drop you a postal. How

is <persName>Hattie</persName>? I have got a cold but that's all. this postal is kinda dirty but I got cause it is just what we will do isn't it. Just wait we'll let them know you're not dead. ha ha</p>

<signed>bye. <persName>Golda</persName></signed></div>

Page 17: More Text Encoding Initiative (TEI)

Identifying places

• <placeName> for geo-political place names– <placeName>Rochester, NY</placeName>

– <placeName>  <settlement type="city">Rochester</settlement>, <region type="state">New York</region></placeName>

• <geoName> for places named in terms of geographic features such as mountains, lakes, or rivers, independently of geo-political units– <geogName type="river">Mississippi River</geogName>

Page 18: More Text Encoding Initiative (TEI)

Identifying dates

• <date> contains a date in any format• <time> contains a phrase defining a time of day

in any format. • the attribute @when normalizes the date or time

in a standard form, e.g. yyyy-mm-dd.– <date when="1945-10-24">24 Oct 45</date>– <date when="1996-09-24T07:25:00Z">September

24th, 1996 at 3:25 in the morning</date> – <time when="1999-01-04T20:42:00-05:00">Jan 4

1999 at 8 pm</time>

Page 19: More Text Encoding Initiative (TEI)

Other elements can record date + time information

• Normalized dates and times can be expressed for other elements through attributes– A complete table of “date-able” elements:

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.html

• For example:

<birth when=“1981-01-23”>January 23, 1981</birth>

Page 20: More Text Encoding Initiative (TEI)

Expressing date spans and ambiguous dates

• @notBefore specifies the earliest possible date for the event

• @notAfter specifies the latest possible date for the event

• @from indicates the starting point of the period• @to indicates the ending point of the period

<residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence>

Page 21: More Text Encoding Initiative (TEI)

Elements applicable to correspondence

• <opener> groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter.

• <closer> groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter.

• <dateline> contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer.

• <salute> contains the salutation in the opening/closing of a letter, preface, etc.

• <signed> contains the closing signature

Page 22: More Text Encoding Initiative (TEI)

Sample use of <opener> and <closer>

• <div type="letter" n="14"> <head>Letter XIV: Miss Clarissa Harlowe to Miss Howe</head> <opener>  <dateline>Thursday evening, March 2.</dateline> </opener> <p>On Hannah's depositing my long letter ...</p> <p>An interruption obliges me to conclude myself   in some hurry, as well as fright, what I must ever be,</p> <closer>  <salute>Yours more than my own,</salute>  <signed>Clarissa Harlowe</signed> </closer></div>

• (Taken from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSOC)

Page 23: More Text Encoding Initiative (TEI)

Building a personography

• A personography is a list of normalized biographical data about persons tagged in your TEI document

• It can be referenced in multiple TEI documents

• It can be used to enhance search + browse tools

Page 24: More Text Encoding Initiative (TEI)

The <listperson> element

• Personographies are contained within <sourceDesc> in the header

• @xml:id is used to uniquely identify a person

<listPerson> <person>

<persName xml:id="HJ"><forename>Hattie</forename> <surname>Jacobs</surname></persName>

<sex>female</sex> <residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence></person>

</listPerson>

Page 25: More Text Encoding Initiative (TEI)

Referencing personography data in the transcription

• Use @ref to refer to the @xml:id you assigned to that person <address>

<addrLine>

Miss <persName ref="#HJ">Hattie Jacobs</persName>

</addrLine> <settlement>Madrid</settlement> <region>Iowa</region></address>

Page 26: More Text Encoding Initiative (TEI)

Other global lists

• Similarly, you can use @xml:id create a global list of other elements– <listPlace>– <listOrg>– <listBibl>– <listEvent>

Page 27: More Text Encoding Initiative (TEI)

Using <teiCorpus>

• <teiCorpus> can be used as a wrapper root element for multiple <TEI> documents

• <teiCorpus> has its own global header for capturing metadata about all of the <TEI> documents it contains

• Example – postcards: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml

Page 28: More Text Encoding Initiative (TEI)

Take a break

Page 29: More Text Encoding Initiative (TEI)

In class

• Continue Assignment 5: Mark up digital texts in TEI

• If you have finished encoding the basic structure in your TEI documents:– try enhancing your markup with name, date, and

place information– try nesting your TEI documents within one

<teiCorpus> document– try building a personography