Quality and Consistency in Text Alignment James R. Covington [email protected] Miklal Software Solutions
Aug 17, 2015
Quality and Consistency in Text Alignment
James R. Covington
Miklal Software Solutions
Text alignmentutility as a function of quality and consistency
Text alignment: utility
Machine translation
Text comparison
Preaching
Education in biblical languages
Textual criticism
Translation technique
Lexicography
Biblical interpretation
James R. Covington | Miklal Software Solutions | [email protected]
Text alignment: quality and consistency
Machine translation lots of data
Text comparison big picture
Preaching
Education in biblical languages
Textual criticism
Translation technique
Lexicography
Biblical interpretation
James R. Covington | Miklal Software Solutions | [email protected]
Text alignment: quality and consistency
Machine translation lots of data
Text comparison big picture
Preaching bad sermon
Education in biblical languages bad exam
Textual criticism bad research
Translation technique
Lexicography
Biblical interpretation
James R. Covington | Miklal Software Solutions | [email protected]
Text alignment: quality and consistency
Part 1: writing consistency standards
Part 2: designing a software tool to promote consistency
Part 3: post-processing quality control
James R. Covington | Miklal Software Solutions | [email protected]
Writing consistency standardsguidelines for evaluating quality and consistency
Writing consistency standards
Step 1: Engineering (our focus today)
Step 2: Proofing
Step 3: Revising
James R. Covington | Miklal Software Solutions | [email protected]
Engineering: principles
Principle 1: as small as possible
“Each set of tokens being linked should be as small as possible.”
Principle 2: as large as necessary
“Each set of tokens being linked should be as large as necessary.”
Principle 1 > Principle 2
James R. Covington | Miklal Software Solutions | [email protected]
Engineering: principles
Principle 1: as small as possible
Gen 12:4James R. Covington | Miklal Software Solutions | [email protected]
Engineering: principles
Principle 2: as large as necessary
Ex 34:6James R. Covington | Miklal Software Solutions | [email protected]
Engineering: principles
Principle 2: as large as necessary
ἐν ἐν to to
γαστήρ γαστρὶ be be
ἔχω ἔχουσα with with
child child
Matt 1:18James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
Step 1: Identify grammatical structures in source language.
Step 2: Identify grammatical structures in target language used to translate structures from Step 1.
Step 3: Write a rule for each pair of grammatical structures.
James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
Step 1: Identify grammatical structures in source language.
Function words Substantives Verbs PunctuationArticles Nouns Auxiliaries Quotation markUniv. Quantifier Pronouns Subjects Question markPrepositions Adjectives ObjectsConjunctions Finite
VolitionalInfinitivesParticiples
James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
Step 1: Identify grammatical structures in source language.
Function words Substantives PersonalArticles Nouns ReflexiveUniv. Quantifier Pronouns PossessivePrepositions Adjectives ReciprocalConjunctions Demonstrative
RelativeInterrogativeIndefiniteCorrelative
James R. Covington | Miklal Software Solutions | [email protected]
Step 1: Hebrew structure
ל + infinitive construct
Engineering: case-specific rules
Gen 2:15James R. Covington | Miklal Software Solutions | [email protected]
Step 2: English structures
Case 1: English to + infinitive
Case 2: English infinitive
Engineering: case-specific rules
Case 1
Case 2
Gen 2:15James R. Covington | Miklal Software Solutions | [email protected]
Step 3: Rules
Case 1: English to + infinitive
Rule 1: link separately
Case 2: English infinitive
Rule 2: group ל and infinitive
infinitive is primary
Engineering: case-specific rules
Case 1
Case 2
Gen 2:15James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
Step 1: Greek structure Step 2: English structures
circumstantial participle participle phrase
ἔχων ὑπʼ ἐμαυτὸν στρατιώτας subordinate clause
main clause
prepositional phrase
preposition
Luke 7:8James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
ἔχων ὑπʼ ἐμαυτὸν στρατιώτας Step 2: English structures
having soldiers under myself participle phrase
since I have soldiers under myself subordinate clause
and I have soldiers under myself main clause
in having soldiers under myself prepositional phrase
with soldiers under me (ESV) preposition
Luke 7:8James R. Covington | Miklal Software Solutions | [email protected]
Engineering: case-specific rules
Step 3: Rules
Case 1: participle phrase Case 2: subordinate clause
συμπαραλαβὼν taking though
καὶ Ἕλλην he
Τίτον Titus ὤν was
along a
with Greek
me
Gal 2:1 Gal 2:3James R. Covington | Miklal Software Solutions | [email protected]
Proofing and Revising
Proofing: multiple readers
time
consult work of other alignments
Revising: begin alignment
note problem spots
note undefined cases
revise and expand cases/rules
James R. Covington | Miklal Software Solutions | [email protected]
Proofing and Revising
Proofing: multiple readers
time
consult work of other alignments
Revising: begin alignment
note problem spots
note undefined cases
revise and expand cases/rules
James R. Covington | Miklal Software Solutions | [email protected]
Designing a software toolan environment to facilitate accuracy and consistency
Designing a software tool: goals
clarity understand alignment correctly
find errors easily
speed make changes quickly
dig deeper quickly
comparison find parallels to check for consistency
James R. Covington | Miklal Software Solutions | [email protected]
Designing a software tool: demo
[demo tool]
James R. Covington | Miklal Software Solutions | [email protected]
Post-processingchecking for accuracy and consistency
Post-processing: philosophy
Find as many algorithmically-detectable mistakes as possible.
Recall > Precision
Precision (low) % hits false
Recall (high) % mistakes caught
James R. Covington | Miklal Software Solutions | [email protected]
Post-processing: techniques
1. Natural Language Processing: conformity to consistency rules
uncommon links
improbable links
consistent treatment of n-grams
2. Graph theory: consistent primary status
James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: rules
ArticlesGen 1:27
Zech 1:10
2 Sam 15:6
James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: rules
Verbs Are auxiliaries grouped with main verbs?
Do main verbs receive primary status?
Of Is “of” grouped with nomen regens (construct)?
Waw Is waw grouped with conjunctions that follow it?
James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: rules
Hebrew definite direct object marker ( תא )
always unlinked (unless interpreted as preposition)
Jer 10:1James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: rules
[demo Hebrew definite direct object checker]
James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: context
Uncommon link checker global context
common tokens
uncommon link
Improbable link checker local context
more probable link
(“unstable marriage”)
James R. Covington | Miklal Software Solutions | [email protected]
Natural language processing: context
[demo uncommon and improbably link checker]
James R. Covington | Miklal Software Solutions | [email protected]
N-grams: consistent alignment
4-gram (Hebrew)
James R. Covington | Miklal Software Solutions | [email protected]
Graph theory: primary status of םש “name”
Example groups linked to םש “name”
name
a/the name
the name of
a name for
renown
was named
she named
he called … name
James R. Covington | Miklal Software Solutions | [email protected]
םש
Graph theory: primary status of םש “name”
Goal: simple directed graph (i.e. no loops)
James R. Covington | Miklal Software Solutions | [email protected]
Graph theory: שוב (qal) “return”
Some graphs get complicated.
James R. Covington | Miklal Software Solutions | [email protected]
Graph theory: שוב (qal) “return”
James R. Covington | Miklal Software Solutions | [email protected]
Conclusions
1. Text alignment is useful inasmuch as it is accurate and consistent.
2. Achieving quality and consistency requires multiple strategies:
a. writing consistency standards (before)
b. software-design (during)
c. post-processing (after)
James R. Covington | Miklal Software Solutions | [email protected]