Top Banner
Measuring Translation Quality in Today’s Automated Lifecycle Arle Lommel & Aljoscha Burchardt (DFKI) with help from Lucia Specia (University of Sheffield) and Hans Uszkoreit (DFKI) Funded by the 7th Framework Programme of the European Commission through the contract 296347.
62

Measuring Translation Quality in Today’s Automated Lifecycle

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measuring Translation Quality in Today’s Automated Lifecycle

Measuring Translation Quality in Today’s Automated Lifecycle

!

Arle Lommel & Aljoscha Burchardt (DFKI) with help from Lucia Specia (University of Sheffield) and Hans Uszkoreit (DFKI)

Funded by the 7th Framework Programme of the European Commission through the contract 296347.

Funded by the 7th Framework Programme of the European Commission through the contract 296347.

Page 2: Measuring Translation Quality in Today’s Automated Lifecycle

PROBLEMS IN ASSESSING QUALITY

Page 3: Measuring Translation Quality in Today’s Automated Lifecycle

95% of professionally translated content at one major LSP is never evaluated for

quality.

Page 4: Measuring Translation Quality in Today’s Automated Lifecycle

“Translators are thegarbage collectors of the documentation world” –Alison

Toon, HP

Page 5: Measuring Translation Quality in Today’s Automated Lifecycle

“I know it when I see it”

Page 6: Measuring Translation Quality in Today’s Automated Lifecycle

Machine Translation quality scores (BLEU, NIST, etc.) are

totally different from human evaluation.

Page 7: Measuring Translation Quality in Today’s Automated Lifecycle

MT methods requirereference translations:

Cannot be used for production purposes

Page 8: Measuring Translation Quality in Today’s Automated Lifecycle

Change the reference translation(s) and

the score changes

Page 9: Measuring Translation Quality in Today’s Automated Lifecycle

The problem with BLEU

No substantial improvement for human use

Page 10: Measuring Translation Quality in Today’s Automated Lifecycle

The problem with BLEU

Substantial human improvement but no BLEU improvement

Page 11: Measuring Translation Quality in Today’s Automated Lifecycle

Human quality assessment takes too much time.

Page 12: Measuring Translation Quality in Today’s Automated Lifecycle

Sampling is random but errors are not.

Page 13: Measuring Translation Quality in Today’s Automated Lifecycle

Wait a minute… What do you mean by

quality?

Page 14: Measuring Translation Quality in Today’s Automated Lifecycle

Quality: A New Definition

A quality translation demonstrates required accuracy and fluency

for the audience and purpose andcomplies with all other negotiated specifications,

taking into account end-user needs.

Source: Alan Melby

Page 15: Measuring Translation Quality in Today’s Automated Lifecycle

Sounds simple, right?

Page 16: Measuring Translation Quality in Today’s Automated Lifecycle

It’s actually quite radical and it drags translation

kicking and screaming into the modern world of

quality management

Page 17: Measuring Translation Quality in Today’s Automated Lifecycle

Multidimensional Quality Metrics

Page 18: Measuring Translation Quality in Today’s Automated Lifecycle

Why not use ashared metric?

Page 19: Measuring Translation Quality in Today’s Automated Lifecycle

OK. !

Which one?

Page 20: Measuring Translation Quality in Today’s Automated Lifecycle

LISA QA Model SAE J2450 SDL TMS

Acrocheck ApSIC XBench

CheckMate QA Distiller XLIFF:Doc

EN15038…

Page 21: Measuring Translation Quality in Today’s Automated Lifecycle

All of them disagree* about what is important to

quality

*The only thing they agree on is terminology

Page 22: Measuring Translation Quality in Today’s Automated Lifecycle

(Probably because there isno single set of criteria that applies to all kinds of

translation)

Page 23: Measuring Translation Quality in Today’s Automated Lifecycle

There is no one-size-fits-all metric

Page 24: Measuring Translation Quality in Today’s Automated Lifecycle

MQM provides a catalog of issue types suitable for

various tasks

Page 25: Measuring Translation Quality in Today’s Automated Lifecycle

The “full” MQM

Page 26: Measuring Translation Quality in Today’s Automated Lifecycle

The “full” MQM

Page 27: Measuring Translation Quality in Today’s Automated Lifecycle

Wait! Weren’t we trying to improve things?

(That looks like a bowl of noodles!)

Page 28: Measuring Translation Quality in Today’s Automated Lifecycle

The MQM Core

Page 29: Measuring Translation Quality in Today’s Automated Lifecycle

Accuracy and Fluency What’s Verity?

Page 30: Measuring Translation Quality in Today’s Automated Lifecycle

Verity provides a way to deal with the text in relation to

the real world.

Page 31: Measuring Translation Quality in Today’s Automated Lifecycle

You don’t use all of MQM (or its core):you use the

parts you need.

Page 32: Measuring Translation Quality in Today’s Automated Lifecycle

MQM for MT Diagnostics

Page 33: Measuring Translation Quality in Today’s Automated Lifecycle

SAE J2450

Page 34: Measuring Translation Quality in Today’s Automated Lifecycle

MQM lets you declare your quality metric in a shared

vocabulary.

Page 35: Measuring Translation Quality in Today’s Automated Lifecycle

Dimensions help you decide what to check

(and also help you communicate with your LSP)

Page 36: Measuring Translation Quality in Today’s Automated Lifecycle

No more assuming what the parties want or how to check it

Page 37: Measuring Translation Quality in Today’s Automated Lifecycle

12 Dimensions(from ISO/TS-11669)

1. Language/locale 2. Subject field/domain 3. Terminology (source/

target) 4. Text type 5. Audience 6. Purpose

7. Register 8. Target text style 9. Content

correspondence 10. Output modality 11. File format 12. Production technology

Page 38: Measuring Translation Quality in Today’s Automated Lifecycle

Open-source tools* to demonstrate MQM

*translate5 source code is published.Other tools’ code will be published in 2014

Page 39: Measuring Translation Quality in Today’s Automated Lifecycle

An Online Tool for Building Dimensions and Metrics

http

://w

ww

.qt2

1.eu

/MQ

M

Page 40: Measuring Translation Quality in Today’s Automated Lifecycle

Tabular Scorecard

http

://w

ww

.qt2

1.eu

/MQ

M

Page 41: Measuring Translation Quality in Today’s Automated Lifecycle

Ergonomic Scorecard

http

://w

ww

.qt2

1.eu

/MQ

M

Page 42: Measuring Translation Quality in Today’s Automated Lifecycle

translate5

DEMO: http://www.translate5.net

Page 43: Measuring Translation Quality in Today’s Automated Lifecycle

Currently discussing with TAUS how to harmonize MQM

and the DQF Error Typology

Page 44: Measuring Translation Quality in Today’s Automated Lifecycle

Quality Estimation (QuEst)

Page 45: Measuring Translation Quality in Today’s Automated Lifecycle

How can you evaluate MT quality when you don’t

have reference translations?

Page 46: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst (Quality Estimation) An open-source tool for

estimating translation quality

Page 47: Measuring Translation Quality in Today’s Automated Lifecycle

Quality Estimation (QE) Metrics• Automatic metrics that provide an estimate on the quality

of (machine) translated segments, without reference translations

• Quality defined according to the problem at hand: • Adequacy • Fluency • Post-editing effort, etc.

Page 48: Measuring Translation Quality in Today’s Automated Lifecycle

Task-Based Quality• Does it need human revision to achieve HT

quality?

• Can a reader get the gist?

• How much effort is required to post-edit the text? (If we know this we have a business case for MT)

Page 49: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst Framework• Open source tool for QE: http://www.quest.dcs.shef.ac.uk/

• E.g. predict 1-5 scores for post-editing effort:

• 1 highest, 5 lowest

• English-Spanish news data, but can be used for other language pairs

• System built from 1,000 examples of translated segments annotated by humans

Page 50: Measuring Translation Quality in Today’s Automated Lifecycle

Uses a set of bilingual training data* to establish linguistic

baselines* Uses source+MT for training, but can also use extra resources (e.g., language models trained on TM)

Page 51: Measuring Translation Quality in Today’s Automated Lifecycle

Provides an estimate for how well the translation fits with

your existing translations

Page 52: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst can rank multiple translations to find the best

one

Page 53: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst Rating Example

Reliability: QuEst rating differs on average 0.61 from human-assigned scores

Page 54: Measuring Translation Quality in Today’s Automated Lifecycle

No more random samples. QuEst can identify sentences that

are likely to require human attention

Page 55: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst can tell you where it makes sense to post-edit and

where it makes sense to start from scratch.

Page 56: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst Improves Post-Editing Time

Language PE time without QE PE time with QE % increase in PE productivity

FR→EN 0.75 words/second 1.09 words/second 45%

EN→ES 0.32 words/second 0.57 words/second 78%

Page 57: Measuring Translation Quality in Today’s Automated Lifecycle

QuEst + MQM Targeted quality evaluation combining the strengths of

humans and machines

Page 58: Measuring Translation Quality in Today’s Automated Lifecycle

MQM will be turned over to industry for long-term

maintenance and eventual standardization

Page 59: Measuring Translation Quality in Today’s Automated Lifecycle

Learn more at http://www.qt21.eu

Page 60: Measuring Translation Quality in Today’s Automated Lifecycle

Detailed presentation covering MQM pilot study

(in German) today at 5:15

Page 61: Measuring Translation Quality in Today’s Automated Lifecycle

Join us tomorrow morning for a detailed demonstration

of how to use MQM (9:45–10:30)

Page 62: Measuring Translation Quality in Today’s Automated Lifecycle

Questions?