Top Banner
Joke Daems [email protected] www.lt3.ugent.be/en/projects/robot Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same coin assessing translation quality through adequacy and acceptability error analysis
19

Joke Daems [email protected] lt3.ugent.be/en/projects/robot Supervised by:

Mar 14, 2016

Download

Documents

ava-norman

Two sides of the same coin assessing translation quality through adequacy and acceptability error analysis. Joke Daems [email protected] www.lt3.ugent.be/en/projects/robot Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker. What makes error analysis so complicated?. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Joke [email protected]

www.lt3.ugent.be/en/projects/robotSupervised by:

Lieve Macken, Sonia Vandepitte, Robert Hartsuiker

Two sides of the same coinassessing translation quality through

adequacy and acceptability error analysis

Page 2: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

What makes error analysis so complicated?

“There are some errors for all types of distinctions, but the most problematic distinctions

were for adequacy/fluency and seriousness.” – Stymne & Ahrenberg, 2012

Does a problem concern adequacy, fluency, both, neither?

How do we determine the seriousness of an error?

Page 3: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Two types of quality

“Whereas adherence to source norms determines a translation's adequacy as

compared to the source text, subscription to norms originating in the target culture

determines its acceptability.”- Toury, 1995

Why mix?

Page 4: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

2-step TQA approach

Acceptability= target norms

Adequacy= target vs.

source

Quality Assessment

Page 5: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Subcategories

Acceptability

Grammar & Syntax

Lexicon

Spelling & typos

Style & register

Coherence

Adequacy

Contradiction

Deletion

Addition

Word sense

Meaning shift

Page 6: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Acceptability: fine-grainedGrammar & Syntax Lexicon Spelling & Typos Style & Register Coherence

article wrong preposition capitalization register conjunction

comparative/superlative wrong collocation spelling mistake untranslated missing info

singular/plural word nonexistent compound repetition logical problem

verb form punctuation disfluent paragraph

article-noun agreement typo short sentences inconsistency

noun-adj agreement long sentence coherence - other

subject-verb agreement text type

reference style – other

missing

superfluous

word order

structure

grammar – other

Page 7: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Adequacy: fine-grained

Meaning shift

contradiction meaning shift caused by misplaced word

word sense disambiguation deletion

hyponymy addition

hyperonymy explicitation

terminology coherence

quantity inconsistent terminology

time other

meaning shift caused by punctuation

Page 8: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

How serious is an error?

“Different thresholds exist for major, minor and critical errors. These should be flexible,

depending on the content type, end-user profile and perishability of the content.”

- TAUS, error typology guidelines, 2013

Give different weights to error categories depending on text type & translation brief

Page 9: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Reducing subjectivity

• Flexible error weights

• More than one annotator

• Consolidation phase

Page 10: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

TQA: Annotation (brat)

1) Acceptability

2) Adequacy

Page 11: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Application example: comparative analysis

wrong collocation

word sense

deletion

punctuation

other meaning shift

0% 2% 4% 6% 8% 10% 12%

Top HT problems newspaper articles

punctuationother meaning shift

compoundtypo

word sensewrong collocation

0% 2% 4% 6% 8% 10% 12%

Top PE problems newspaper articles

wrong collocationuntranslated

other meaning shiftcompound

logical problemterminology

0% 2% 4% 6% 8%10%

12%14%

16%18%

20%

Top HT problems technical texts

other meaning shift

untranslated

article

logical problem

terminology

compound

0% 2% 4% 6% 8% 10%12%14%16%18%

Top PE problemstechnical texts

Page 12: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Next step:diagnostic & comparative evaluation

• What makes a ST-passage problematic?• How problematic is this passage really? (i.e.:

how many translators make errors)• Which PE errors are caused by MT?• Which MT errors are hardest to solve?

Link all errors to corresponding ST-passage

Page 13: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Source text-related error sets• ST: Changes in the environment that are sweeping the planet...• MT: Veranderingen in de omgeving die het vegen van de

planeet tot stand brengen... (wrong word sense) "Changes in the environment that bring about the brushing of the planet..."

• PE1: Veranderingen in de omgeving die het evenwicht op de planeet verstoren... (other type of meaning shift) "Changes in the environment that disturb the balance on the planet..."

• PE2: Veranderingen in de omgeving die over de planeet rasen... (wrong collocation + spelling mistake) "Changes in the environment that raige over the planet..."

Page 14: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Application example: impact of MT errors on PE

0

5

10

15

20

25

30

Top 10 MT errors newspaper articles

compound

terminology

article

logical p

roblem

other mean

ing shift

deletion

structu

re

verb

form

missing co

nstituen

t

word ord

er0

5

10

15

20

25

30

Top 10 MT errors technical texts

Page 15: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Summary

• Improve error analysis by:

– judging acceptability and adequacy separately

– making error weights depend on translation brief

– having more than one annotator

– introducing consolidation phase

• Improve diagnostic and comparative evaluation by:

– linking errors to ST-passages

– taking number of translators into account

Page 16: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Open questions

• How can we reduce annotation time?– Ways of automating (part) of the process?– Limit annotation to subset of errors?

• How to better implement ST-related error sets?– Ways of automatically aligning ST, MT, and various

TT’s at word-level?

Page 17: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Thank you for listening

For more information, contact: [email protected]

Suggestions?Questions?

Page 18: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Quantification of ST-related error sets

ST

MT (1)

MT1(0.5)

wrong word sense (0.5)

MT2 (0.5)

PE (1)

PE1 (0.5)

other meaning shift

(0.5)

PE2(0.5)

wrong collocation

(0.25)

spelling mistake (0.25)

Page 19: Joke Daems joke.daems@ugent.be lt3.ugent.be/en/projects/robot Supervised by:

Inter-annotator agreementHT&PEacceptability HT&PE adequacy MT acceptability MT adequacy

Exp1 Exp2 Exp1 Exp2 Exp1 Exp2 Exp1 Exp2

Initial agreement

39% (κ=0.32)

50%(κ=0.44)

42% (κ=0.31)

46%(κ=0.30)

53% (κ=0.49)

79%(κ=0.77)

57% (κ=0.46)

51%(κ=0.41)

Agreement after consolidation

67% (κ=0.65)

81%(κ=0.80)

82% (κ=0.79)

94%(κ=0.92)

84% (κ=0.83)

95%(κ=0.94)

94% (κ=0.92)

86%(κ=0.83)

Correlation between annotators

r=0.67, n=38, p<0.001

r=0.95, n=34, p<0.001

r=0.87, n=38, p<0.001

r=0.86, n=34, p<0.001

n/a n/a n/a n/a

Agreement on categories

90% (κ=0.89)

89%(κ=0.88)

89% (κ=0.87)

88%(κ=0.83)

83% (κ=0.81)

93%(κ=0.93)

86% (κ=0.79)

86%(κ=0.82)