Top Banner
Crowdsourced Manuscript Transcription Ben Brumfield Roots and Routes 2012
31

Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Nov 22, 2014

Download

Technology

benwbrum

3-hour long workshop on crowdsourced transcription software for the University of Toronto's Roots and Routes seminar in 2012.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Crowdsourced Manuscript Transcription

Ben BrumfieldRoots and Routes 2012

Page 2: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Not just crowdsourcing...

● Collaborative work● Off-site solo work● Private work

Page 3: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Not just manuscripts...

● Maps● Textiles● Music● Flawed OCR

Page 4: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Not just transcription...

● Indexing● Editing● Identification Counting seals on Arctic ice caps.

Page 5: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

What it isn't

We'll concentrate on web-based tools for extracting text from images, not addressing:● Oral History● Video● Audio Transcription● Image Manipulation● Transcription/Facsimile Display Tools exist for these tasks, nevertheless.

Page 6: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Break

What materials are you working with outside of modern, printed books and websites?

Page 7: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Origins (Approaches)

Two Approaches and one Dead End● Indexing● Editing● Tagging

Page 8: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Indexing

● Structured Data● Extracts from Text vs. Representing Text● Databases for Search and Analysis● Granular Quality Control● Gamification

Page 9: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Editing

● Books, Diaries, Letters, Articles● Representing Text● Traditional Editorial Workflow● Digital or Print Editions

Page 10: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Tagging

● Too small● Too imprecise

Page 11: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Origins (Traditions)

● OCR Correction● Documentary Editing● Genealogy● Natural Science● Astronomy Split this into 5 slides

Page 12: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Online Tools

● Recent (none older than 2005)● Influenced by origin● Still pretty raw● Most require tech expertise for set-up and

customization● All require making trade-offs

Page 13: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Lab Session 1: Breadth

NYPL What's on the MenuIndexing

Wikisource

Editing

Page 14: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Selection Factors

● Source Material● Transcript Purpose● Organizational/Project Management Fit● Financial and Technical Resources

Page 15: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Source Material

Evaluating your source material:● Is it of interest to anyone else?● Is it under copyright?● Does it need restricted access?● Is it composed of documents or records?● Is it non-textual?● How complex is the layout? How important

is that layout?

Page 16: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Purpose

How will you be using the transcribed data?● Traditional print editions● Searchable online editions● Do you want to use the system to analyze

the text?● How do you want to analyze the text?● Is public engagement a goal?● Should the transcripts be open?

Page 17: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Organizational/Project Management Fit

● How important is traditional editorial workflow?

● Will you rely on volunteers? How will you motivate them?

● What is the duration of the project?● Is there a "final version"?● Is TEI a mandate?

Page 18: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Financial and Technical Resources

Do you have or need:● System administrators to install non-hosted

software?● Money to pay hosting costs?● Programming skills to customize a tool?● Money to pay programmers for

customization?● Support for on-going costs to keep the site

running, however small?

Page 19: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Lab Session 2: Markup Options

FromThePage TranscribeBentham

Page 20: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Technical Questions to Answer

● Where are the images now?● How do images get into the system?● How do transcripts get out of the system?● How mature is the underlying technology?● How configurable is the technology?● How does the system work with the public

face of your project?● Where does the metadata live?● Who will maintain this? How long?● How many sites are using this system?

Page 21: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Wikisource

Pro:● Mediawiki plus its add-on modules (e.g.

print-on-demand, export).● Wikimedia community.● Incredibly mature.Con:● Wikimedia policy.● Public editing.● Limited mark-up.

Page 22: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Bentham Transcription Desk

Pro: ● MediaWiki is very mature.● TEI Toolbar (can also be used on other

systems)● Deployed outside original project. Con:● Development efforts halted.

Page 23: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Scripto

Pro:● Team at CHNM has a great track record.● Your CMS is your public face.● MediaWiki is very mature.● Deployed and under active development. Con:● Your CMS handles all metadata.● Mark-up is extremely limited.

Page 24: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

FromThePage

Pro:● Designed for intensive editing and indexing.● Semantic mark-up and analysis.● Hosting available. Con:● Single developer (me).● No TEI mark-up.

Page 25: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Islandora TEI Editor

Caveat: I don't know much about this tool or this team. ● Based on Drupal and Fedora● Supports TEI via friendly interface● Many Drupal-based projects considering it.

Page 26: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

T-PEN

Caveat: I don't know much about this tool. ● Designed for medieval manuscripts.● Supports TEI natively.● Line-by-line interface.● Hosted version available.

Page 27: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Scribe

Pro:● Excellent for complex layout or non-

documentary transcription.● Zooniverse team is large, well-funded,

experienced.● Configurable.Con:● No automated tool for loading images or

viewing transcript database (yet!)● No concept of image-as-a-text.

Page 28: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Pybossa

Caveat: I don't know much about this tool or this team. ● Open Knowledge Foundation's

crowdsourcing task management tool.● Designed for tabular data.● Google Spreadsheet data entry.● Extremely young.

Page 29: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

TextLab

Caveat: I don't know much about this tool or this team. ● Melville Electronic Library.● Direct addition of TEI tags to image.

Page 30: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Lab Session 3: Configuration

ScribeOld Weather, What's the Score, Development deployments

Page 31: Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Find me

Ben [email protected]

http://manuscripttranscription.blogspot.com/@benwbrum