Quality Estimation Method HT vs MT Data analysis Conclusions Predicting Human Translation Quality Lucia Specia University of Sheffield [email protected]QTLaunchPad Workshop, Dubrovnik 15 June 2014 (Joint work with Kashif Shah) Predicting Human Translation Quality 1 / 23
43
Embed
Predicting Human Translation Quality - Quality Translation 21 · Predicting Human Translation Quality 5 / 23. Quality EstimationMethodHT vs MTData analysisConclusions Outline 1 Quality
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Quality Estimation Method HT vs MT Data analysis Conclusions
Quality Estimation Method HT vs MT Data analysis Conclusions
Outline
1 Quality Estimation
2 Method
3 HT vs MT
4 Data analysis
5 Conclusions
Predicting Human Translation Quality 2 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Outline
1 Quality Estimation
2 Method
3 HT vs MT
4 Data analysis
5 Conclusions
Predicting Human Translation Quality 3 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Overview
Translation quality estimation (QE) automatic metrics toprovide an estimate on the quality of a translated text
No access to reference translations, MT systems in use
So far, only applied to machine translated (MT) texts
Predicting Human Translation Quality 4 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Overview
Translation quality estimation (QE) automatic metrics toprovide an estimate on the quality of a translated text
No access to reference translations, MT systems in use
So far, only applied to machine translated (MT) texts
Predicting Human Translation Quality 4 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Overview
Translation quality estimation (QE) automatic metrics toprovide an estimate on the quality of a translated text
No access to reference translations, MT systems in use
So far, only applied to machine translated (MT) texts
Predicting Human Translation Quality 4 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Applications
Can a reader get the gist?
Is it worth post-editing it?
How much time to fix it?
Can we publish it as is?
Does it need human checking?
Predicting Human Translation Quality 5 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Outline
1 Quality Estimation
2 Method
3 HT vs MT
4 Data analysis
5 Conclusions
Predicting Human Translation Quality 6 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Learning
Supervised machine learning to build models based ontraining data:
annotated with quality labels (human input at“training” time)described by features
“Quality” defined according to the problem (and data):
Post-editing time for a sentenceMQM issue for a word
Models predict such quality scores
Predicting Human Translation Quality 7 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Learning
Supervised machine learning to build models based ontraining data:
annotated with quality labels (human input at“training” time)described by features
“Quality” defined according to the problem (and data):
Post-editing time for a sentenceMQM issue for a word
Models predict such quality scores
Predicting Human Translation Quality 7 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Learning
Supervised machine learning to build models based ontraining data:
annotated with quality labels (human input at“training” time)described by features
“Quality” defined according to the problem (and data):
Post-editing time for a sentenceMQM issue for a word
Models predict such quality scores
Predicting Human Translation Quality 7 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Outline
1 Quality Estimation
2 Method
3 HT vs MT
4 Data analysis
5 Conclusions
Predicting Human Translation Quality 8 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Key objective in QTLP: (automated) metrics to evaluateand estimate translation quality of human and machinetranslations
MT quality estimation works well (at sentence-level):
WMT12-14 shared tasks
QuEst framework: www.quest.dcs.shef.ukLarge number of recent papersCommercial adoption: Multilizer, SDL-LW, Yandex
Question: Can we apply the same framework to predictthe quality of human translations?
Predicting Human Translation Quality 9 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Key objective in QTLP: (automated) metrics to evaluateand estimate translation quality of human and machinetranslations
MT quality estimation works well (at sentence-level):
WMT12-14 shared tasks
QuEst framework: www.quest.dcs.shef.ukLarge number of recent papersCommercial adoption: Multilizer, SDL-LW, Yandex
Question: Can we apply the same framework to predictthe quality of human translations?
Predicting Human Translation Quality 9 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Key objective in QTLP: (automated) metrics to evaluateand estimate translation quality of human and machinetranslations
MT quality estimation works well (at sentence-level):
WMT12-14 shared tasks
QuEst framework: www.quest.dcs.shef.ukLarge number of recent papersCommercial adoption: Multilizer, SDL-LW, Yandex
Question: Can we apply the same framework to predictthe quality of human translations?
Predicting Human Translation Quality 9 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
Can we predict HT quality?
Motivation: Automate/sample for quality assurance
Encouraging fact: hard to distinguish MT and HT (EAMT-Tuesday). But:
1 Do (professional) human translators make mistakes?
2 Are HT errors the same/similar to MT errors?
3 Are current quality estimation tools good for HT?
Data analysis to answer these questions: sentence- and(partially) word-level
Predicting Human Translation Quality 10 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
QTLP datasets
Not possible before QTLP: no large enough datasetavailable with both MTs and HTs
Our data is different from existing ’learner’ corpora:
HTs produced by professionals2-3 state-of-the-art MT systems (RBMT, SMT, hybrid)Both HT and MT annotated by professionaltranslators4 language-pairsNews and ’customer’ data
Datasets also used for WMT14 QE shared tasks
Predicting Human Translation Quality 11 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
QTLP datasets
Not possible before QTLP: no large enough datasetavailable with both MTs and HTs
Our data is different from existing ’learner’ corpora:
HTs produced by professionals2-3 state-of-the-art MT systems (RBMT, SMT, hybrid)Both HT and MT annotated by professionaltranslators4 language-pairsNews and ’customer’ data
Datasets also used for WMT14 QE shared tasks
Predicting Human Translation Quality 11 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions
QTLP datasets - sentence-level
Labels:
1 = Perfect translation, no post-editing needed at all
2 = Near miss translation: translation contains maximum of2-3 errors, and possibly additional errors that can be easilyfixed (capitalisation, punctuation, etc.)
3 = Very low quality translation, cannot be easily fixed
Sentences:
# Source # HT+MTs # Target1,104 English 4 4,416 Spanish500 English 4 2,000 German500 German 3 1,500 English500 Spanish 3 1,500 English
Predicting Human Translation Quality 12 / 23
Quality Estimation Method HT vs MT Data analysis Conclusions