Digital Design of Paper Technologies Workshopbruder/.../presentation.beamer.pdf · Papertechnologies Authorswhowritewithpenandpaperoftenproducenon-linearor multi-lineartexts....

Digital Design of Paper Technologies Workshop

On the durability and curatibility of Digital Scholarly Editions

Daniel Bruder

Those who cannot remember the past are condemned torepeat it.

Those who cannot reconstruct the archive(s) are doomedto not be able to remember the past?

Paper technologies

Authors who write with pen and paper often produce non-linear ormulti-linear texts.The resulting printed book version (“letzte Hand”) is just the finallinearisation step of the author’s on-going process of revising thetext.To record these revision processes in Digital Scholarly Editions, datastructures are needed that can naturally cope with suchrequirements of multi-linearity.Moreover, the Digital Scholarly Editions and thus the DigitalArchives produced in this way must offer two sine qua non qualities,namely: longevity and interoperability.

Books

I stabilityI readabilityI comparabilityI relatibilityI disseminatibilityI longevityI interoperability

Digital Editions

I searchI accessI relationI publicationI transmissionI collaborationI combinationI mutation. . .

print vs digital?

I books have a formidable shelf life, they last for centuriesI digital data doesn’t. It is schockingly ephemeral and highly

prone to obsolescenceI we better not falsely rely on the durability of our digital dataI we better be extremely careful in the curation of our digital

assets

Digital Scholarly Editions: print vs digital?

I the practice and methodology of editing is well-established,mature and stable

I every task that is natural with books needs a customtool/implementation in the digital medium

I not everything we can do and offer as programmers isnecessarily meaningful and/or useful to humanities’ scholars’needs

Digital Humanities

I meaningful conversation

Standards

. . . are a good thing!

Standards

I Motivations behind the TEII restriction to a specific set of tags (schema) to guarantee

standardized storageI tools to extract relevant parts, according to the schemaI founded on the OHCO (Ordered Hierarchy of Content Objects)

tree model assumption

Problems

I OHCO (Ordered Hierarchy of Content Objects) tree modelassumption

Problems

I Natural language (as found in the production stage) ofmanuscripts and its often non-hierarchical nature

I Non-hierarchical structures, the tree model assumption and theneed for “Twisted XML”

I Twisted XML and the fragmentation of knowledgeI The fragmentation of knowledge and idiosyncracy of the

archivesI The accumulating idiosyncrasy of the archives and the

potential non-reconstructibility

Twisted XML

Lorem ipsum dolor sit amet, consectetur adipisicing elit`---------' comment1

Twisted XML

Lorem ipsum dolor sit amet, consectetur adipisicing elit`---------' comment1

`------------------' comment2

Twisted XML

Lorem ipsum dolor sit amet, consectetur adipisicing elit`----'`---' comment1

`------------------' comment2

Twisted XML

Lorem

ipsumdolor sit amet

, consectetur adipisicing elit

Sidenote: Fragmentation of “de-facto standards”

Figure 1: Idiosyncracy

Twisted XML

These Twisted XML structuresI are idiosyncratic to each projectI are beyond what the schema can captureI are establishing structures beyond the tree paradigmI XML tools are not ready for itI Humans are not ready for itI dramatically limit the re-use of tools and durability of the data

Fully TEI compliant sources – still no longevity!

Figure 2: Possible Archives?

Nested structures?

We are essentially dealing with the problem of representing multiplehierarchies.

Confident reconstruction?

The fragmentation and idiosyncratic nature of these “archives”make it impossible to re-use and share the much-needed extractors.Any such project needs custom extractors – specifically tailored totheir data.These custom extractors are expensive.Confidency in these extractors can only be given by the editor (if atall).

Meaningful access?

There is no “protocol” for the combination of extractions: what if Iwant to combine several aspects?

A different model

I we are essentially dealing with graph structures vs. treestructures

I current practices have an intrinsic obsolescenceI the archives produced today are essentially “dead”: difficult to

annotate, difficult to disseminate, difficult to reconstruct

A different model

I embracing graphsI embracing “shared paths” over the text, representing different

perspectives/readings/hierarchies/aspectsI precise pre-indexing vs. error-prone post-extractionI confident rendering of derivatives:

I XML/TEII other formats.

What if. . . ?

What ifI we could significantly reduce the cognitive load of the

transcriber by giving him more ‘natural’ ways to do his work?I we could confidently produce only the intended readings and

properly expose the variations and production processes?I we could easily add more interesting sources, like diaries, etc

and make these truly interoperable?I we could effortlessly access each and every single piece of

information in the same, generic way?I we could flexibly combine these pieces of information?I we could progressively go way beyond these isolated pieces of

information and their combination and even make ‘knowledge’available that is not explicitly recorded?

I we could produce DSEs in more adequate ways and be moreconfident on their possible future use?

What if..

we could, in general,I achieve more accurate resultsI with confidenceI in higher qualityI in significantly less timeI in more adequate and natural waysI that are, in fact, re-useable and durableI and, ideally, import as much as possible from the old

repositories?

Graphs vs Trees

Figure 3: Facsimile

Graphs vs Trees

Figure 4: Transcription

Graphs

Figure 5: Graph structure

Graph structures with Edition Operations (a differentdivide)

I 5 + 1 ≡ 561 + 0

Edition Operations

I insertionI deletionI variationI substitutionI transpositionI annotation

Graph structures

d1: i1 = -----------v1: i1 = variatedi1: b0 = inserted at the endb0 = This is the baseline with content .-----------------------------------------------------------------c1: b0 = < > comment1c2: b0 = < > comment2

Hierarchy and Path Sharing and Specialization

Figure 6: Path Sharing

Examples and Patterns

add: base = dolorbase = lorem ipsum sit amet

Figure 7:

(more on the bonus slides)

Benefits

By using graph structures we canI accurately model, pre-index and record exactly what we intendI make post-extraction unnecessaryI naturally host different – potentially overlapping – readings,

annotations, views and perpectivesI export to a variety of formatsI have discernible and intelligible, future-proof sourcesI progressively add and combine with metadata

Metadata

Figure 8: Pocket Diaries

Metadata

Figure 9: Prototractratus

Full rendering as Linked Open Data for meaningful access

By using Semantic Web Technologies and rendering our data into aLinked Open Data Knowledge base, we can

I meaningfully model and add value to our knowledgeI do reasoning and inferencingI flexibly query and access all aspectsI integrate with all kinds of external sources and metadata

Knowledge Base + Ontology (Vocabulary, Rules) + Reasoners =Inferences

I Recording of knowledgeI Production of knowledge, new insights

Bonus

Deletion

Figure 10:

Subtitution

Figure 11:

Variation

Figure 12:

Transposition

Figure 13:

Annotation

add: base = dolorbase = lorem ipsum sit amet--------------------------------------anno: base = < > "annotation"

Digital Design of Paper Technologies Workshopbruder/.../presentation.beamer.pdf · Papertechnologies Authorswhowritewithpenandpaperoftenproducenon-linearor multi-lineartexts....

Documents