Digital Design of Paper Technologies Workshop On the durability and curatibility of Digital Scholarly Editions Daniel Bruder
Digital Design of Paper Technologies Workshop
On the durability and curatibility of Digital Scholarly Editions
Daniel Bruder
Those who cannot remember the past are condemned torepeat it.
Those who cannot reconstruct the archive(s) are doomedto not be able to remember the past?
Paper technologies
Authors who write with pen and paper often produce non-linear ormulti-linear texts.The resulting printed book version (“letzte Hand”) is just the finallinearisation step of the author’s on-going process of revising thetext.To record these revision processes in Digital Scholarly Editions, datastructures are needed that can naturally cope with suchrequirements of multi-linearity.Moreover, the Digital Scholarly Editions and thus the DigitalArchives produced in this way must offer two sine qua non qualities,namely: longevity and interoperability.
Books
I stabilityI readabilityI comparabilityI relatibilityI disseminatibilityI longevityI interoperability
Digital Editions
I searchI accessI relationI publicationI transmissionI collaborationI combinationI mutation. . .
print vs digital?
I books have a formidable shelf life, they last for centuriesI digital data doesn’t. It is schockingly ephemeral and highly
prone to obsolescenceI we better not falsely rely on the durability of our digital dataI we better be extremely careful in the curation of our digital
assets
Digital Scholarly Editions: print vs digital?
I the practice and methodology of editing is well-established,mature and stable
I every task that is natural with books needs a customtool/implementation in the digital medium
I not everything we can do and offer as programmers isnecessarily meaningful and/or useful to humanities’ scholars’needs
Digital Humanities
I meaningful conversation
Standards
. . . are a good thing!
Standards
I Motivations behind the TEII restriction to a specific set of tags (schema) to guarantee
standardized storageI tools to extract relevant parts, according to the schemaI founded on the OHCO (Ordered Hierarchy of Content Objects)
tree model assumption
Problems
I OHCO (Ordered Hierarchy of Content Objects) tree modelassumption
Problems
I Natural language (as found in the production stage) ofmanuscripts and its often non-hierarchical nature
I Non-hierarchical structures, the tree model assumption and theneed for “Twisted XML”
I Twisted XML and the fragmentation of knowledgeI The fragmentation of knowledge and idiosyncracy of the
archivesI The accumulating idiosyncrasy of the archives and the
potential non-reconstructibility
Twisted XML
Lorem ipsum dolor sit amet, consectetur adipisicing elit`---------' comment1
Twisted XML
Lorem ipsum dolor sit amet, consectetur adipisicing elit`---------' comment1
`------------------' comment2
Twisted XML
Lorem ipsum dolor sit amet, consectetur adipisicing elit`----'`---' comment1
`------------------' comment2
Twisted XML
<p><span id="comment1" next="comment1-2">Lorem</span>
<span id="comment2"><span id="comment1-2">ipsum</span>dolor sit amet
</span>, consectetur adipisicing elit</p>
Sidenote: Fragmentation of “de-facto standards”
Figure 1: Idiosyncracy
Twisted XML
These Twisted XML structuresI are idiosyncratic to each projectI are beyond what the schema can captureI are establishing structures beyond the tree paradigmI XML tools are not ready for itI Humans are not ready for itI dramatically limit the re-use of tools and durability of the data
Fully TEI compliant sources – still no longevity!
Figure 2: Possible Archives?
Nested structures?
We are essentially dealing with the problem of representing multiplehierarchies.
Confident reconstruction?
The fragmentation and idiosyncratic nature of these “archives”make it impossible to re-use and share the much-needed extractors.Any such project needs custom extractors – specifically tailored totheir data.These custom extractors are expensive.Confidency in these extractors can only be given by the editor (if atall).
Meaningful access?
There is no “protocol” for the combination of extractions: what if Iwant to combine several aspects?
A different model
I we are essentially dealing with graph structures vs. treestructures
I current practices have an intrinsic obsolescenceI the archives produced today are essentially “dead”: difficult to
annotate, difficult to disseminate, difficult to reconstruct
A different model
I embracing graphsI embracing “shared paths” over the text, representing different
perspectives/readings/hierarchies/aspectsI precise pre-indexing vs. error-prone post-extractionI confident rendering of derivatives:
I XML/TEII other formats.
What if. . . ?
What ifI we could significantly reduce the cognitive load of the
transcriber by giving him more ‘natural’ ways to do his work?I we could confidently produce only the intended readings and
properly expose the variations and production processes?I we could easily add more interesting sources, like diaries, etc
and make these truly interoperable?I we could effortlessly access each and every single piece of
information in the same, generic way?I we could flexibly combine these pieces of information?I we could progressively go way beyond these isolated pieces of
information and their combination and even make ‘knowledge’available that is not explicitly recorded?
I we could produce DSEs in more adequate ways and be moreconfident on their possible future use?
What if..
we could, in general,I achieve more accurate resultsI with confidenceI in higher qualityI in significantly less timeI in more adequate and natural waysI that are, in fact, re-useable and durableI and, ideally, import as much as possible from the old
repositories?
Graphs vs Trees
Figure 3: Facsimile
Graphs vs Trees
Figure 4: Transcription
Graphs
Figure 5: Graph structure
Graph structures with Edition Operations (a differentdivide)
I 5 + 1 ≡ 561 + 0
Edition Operations
I insertionI deletionI variationI substitutionI transpositionI annotation
Graph structures
d1: i1 = -----------v1: i1 = variatedi1: b0 = inserted at the endb0 = This is the baseline with content .-----------------------------------------------------------------c1: b0 = < > comment1c2: b0 = < > comment2
Hierarchy and Path Sharing and Specialization
Figure 6: Path Sharing
Examples and Patterns
add: base = dolorbase = lorem ipsum sit amet
Figure 7:
(more on the bonus slides)
Benefits
By using graph structures we canI accurately model, pre-index and record exactly what we intendI make post-extraction unnecessaryI naturally host different – potentially overlapping – readings,
annotations, views and perpectivesI export to a variety of formatsI have discernible and intelligible, future-proof sourcesI progressively add and combine with metadata
Metadata
Figure 8: Pocket Diaries
Metadata
Figure 9: Prototractratus
Full rendering as Linked Open Data for meaningful access
By using Semantic Web Technologies and rendering our data into aLinked Open Data Knowledge base, we can
I meaningfully model and add value to our knowledgeI do reasoning and inferencingI flexibly query and access all aspectsI integrate with all kinds of external sources and metadata
Knowledge Base + Ontology (Vocabulary, Rules) + Reasoners =Inferences
I Recording of knowledgeI Production of knowledge, new insights
Bonus
Deletion
Figure 10:
Subtitution
Figure 11:
Variation
Figure 12:
Transposition
Figure 13:
Annotation
add: base = dolorbase = lorem ipsum sit amet--------------------------------------anno: base = < > "annotation"