Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use
Post on 20-Jan-2021
1 Views
Preview:
Transcript
This document is part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)”. This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 296347.
Supplement 1
Practical Guidelines for the Use of MQM in
Scientific Research on Translation Quality
Author(s): Aljoscha Burchardt and Arle Lommel (DFKI)
Dissemination Level: Public
Date: 19.11.2014
This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
2
Grant agreement no. 296347 Project acronym QTLaunchPad Project full title Preparation and Launch of a Large-scale Action for Quality Transla-
tion Technology Funding scheme Coordination and Support Action Coordinator Prof. Hans Uszkoreit (DFKI) Start date, duration 1 July 2012, 24 months Distribution Public Contractual date of delivery — Actual date of delivery 18.November 2014 Supplement number 1 Supplement title Practical Guidelines for the Use of MQM in Scientific Research on
Translation Quality Type Report Status and version Final, v1.0 Number of pages Contributing partners DFKI Authors Aljoscha Burchard, Arle Lommel EC project officer Aleksandra Wesolowska The partners in QTLaunchPad are:
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Dublin City University (DCU), Ireland
Institute for Language and Speech Processing, R.C. “Athena” (ILSP/ATHENA RC), Greece
The University of Sheffield (USFD), United Kingdom
For copies of reports, updates on project activities and other QTLaunchPad-related information, con-tact: DFKI GmbH QTLaunchPad Dr. Aljoscha Burchardt aljoscha.burchardt@dfki.de Alt-Moabit 91c Phone: +49 (30) 23895-1838 10559 Berlin, Germany Fax: +49 (30) 23895-1810 Copies of reports and other material can also be accessed via http://www.qt21.eu/launchpad © 2014, The Individual Authors
This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
3
Table of Contents
1 Executive Summary .............................................................................................................. 4 2 MQM Process ........................................................................................................................ 4
2.1 Selecting a metric ............................................................................................................ 4 2.2 Selecting an Annotation Environment .......................................................................... 6 2.3 Selection of Annotators and Training ............................................................................ 6 2.4 Evaluation ....................................................................................................................... 7 2.5 Analysis ........................................................................................................................... 8
3 Costs ...................................................................................................................................... 8 4 Amount of text required ....................................................................................................... 9 5 Training materials ................................................................................................................. 9
5.1 Decision trees .................................................................................................................. 9 5.1.1 A generalized decision tree ..................................................................................... 11
5.2 Annotation guidelines ................................................................................................... 11
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
4
1 Executive Summary This report provides practical guidelines for the use of the Multidimensional Quality Metrics (MQM) framework for assessing translation quality in scientific research projects. It does not address the use of MQM in production environments systematically, although notes are pro-vided concerning the use in these environments. It covers the process for using MQM, the costs, required amounts of text, training methods, and other relevant factors. MQM can provide detailed insights about translation issues/errors on different levels of granularity up to the word/phrase level as input for systematic approaches to overcome translation quality barriers. Like the common practice of post-editing, it requires manual work that will hopefully become less labor-intensive in the future through (partial) automa-tion.
2 MQM Process This section outlines the process for using MQM in a research scenario. It covers selection of a metric, training, the evaluation task itself, and analysis of results.
2.1 Selecting a metric The Multidimensional Quality Metrics (MQM) framework does not provide a translation quality metric, but rather provides a framework for defining task-specific translation metrics. Thus, rather than speaking of or using MQM itself for a specific quality evaluation task, one uses an MQM-compliant metric. To create an MQM-compliant metric, one must make a determination about which issues will be checked and to what level of granularity. At the coarsest level, it is possible to have an MQM-compliant metric that identifies as few as two error types: Accuracy and Fluency. (If only the target text is evaluated, it is even possible to have a single-issue metric with Fluency alone, but this metric could not be said to assess translation quality in any meaningful sense.) Generally, however, additional detail would be desirable and a more detailed metric would be needed. For example, the issue type hierarchy of the metric used for annotating corpus data in the QTLaunchPad project’s shared task can be graphically represented as shown in Figure 1. This particular metric was designed to provide analytic insight into the problems encountered in high-quality MT. With 19 issue types, it is considerably more gran-ular than would be used in many production evaluation environments, but the detail was needed to support the QTLaunchPad evaluation tasks. Note that it extends the MQM issue set by adding three custom subtypes to Function words: Extraneous, Incorrect, and Missing. These issues provide additional insight into one aspect of translation the proved to be partic-ularly difficult for MT.
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
5
Figure 1. MQM-compliant error hierarchy for diagnostic MT evaluation
This metric would not be suitable for all cases, and is presented here as an example. In gen-eral, an MQM-compliant metric designed for a research task should have the following quali-ties:
• It should be granular enough to address the relevant research questions. For example, a simple Accuracy-Fluency metric that emulates traditional Adequacy/Fluency evalua-tions in MT research would provide no insight into the specific nature of issues within those categories. Therefore, the metric selected should be certain to cover the research agenda. (In the case of QTLaunchPad, the research agenda was broad and focused on discovery of patterns, so the metric is fairly complex.)
• The metric should not contain extraneous categories or ask annotators to mark issues irrelevant to the research question. For example, it does not make sense to use Termi-nology in addition to general Mistranslation when working on news data where no de-fined terminology exists. Adding categories can increase “noise” in the data and also raises costs of annotation. However, if there are “borderline” categories that may be relevant, they should be included since retroactively adding them in would generally not be possible.
• The metric should be small enough to be maintained in the memory of the annotator. General psychometric guidelines suggest that categorizations used in evaluation should target six to seven items. For detailed evaluation such a small set may not be possible (the 19 categories of the MQM shared task are probably pushing the outer limit of what it is cognitively possible for annotators to keep in mind).
• Annotators must be given heuristics for selection of issues in ambiguous cases. (Ways to provide this guidance are covered in Section 5 (Training materials) below.
For translation production evaluation the QTLaunchPad website’s section on MQM1 contains useful information on creating relevant metrics based on project specifications. Research projects, by contrast, typically will have a clearer set of requirements (those needed to an-swer the research question at hand), but will also often be more complex than is recom-mended for production evaluation. After selecting the MQM issue types to be used in the evaluation task, an appropriate annota-tion environment needs to be configured to support the issue type selection. Both translate5 and the MQM scorecard are configured using a simple XML file (see Figure 2) that identifies the issues to be used. Other environments that could be configured to use MQM categories may use other mechanisms to declare a metric.
1 http://www.qt21.eu/launchpad/content/multidimensional-quality-metrics
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
6
Figure 2. XML MQM metric definition file for use in translate5 and the MQM scorecard.
2.2 Selecting an Annotation Environment There are a number of types of annotation environments:
• At the coarsest level are questionnaires, spreadsheets and simple score card tools that simply count errors (but do not indicate their location within text) or evaluate texts as a whole. These tools are useful for looking at features of the text as a whole, but do not provide detailed insight into specific errors. Such systems are generally not advisable for translation research tasks that involve error analysis (but they may be suitable in some production environments or for research projects where finer granularity is not needed).
• At a finer level of granularity are scorecard systems that store annotations at the seg-ment level. They allow users to attach errors to specific segments, but not to specific words. They may support adding notes or highlighting text. These systems are typically easy to use but do not tie issues to specific locations. These systems are useful for quick annotation where it is sufficient to know which segments have which problems. The MQM Scorecard tool provides this functionality.
• Span-level annotation tools provide the ability to tie errors to particular spans in the text. Using them requires more training and care than is needed for the other tools since issues have to be associated with spans of text. These tools provide the greatest insight into errors. The translate5 tool used for most QTLaunchpad tasks is this sort of tool.
The environment selected must support the analysis intended for the annotated data. In general, it is wise to err on the side of caution and ask for more detail rather than less. After selecting the annotation environment it must be configured with the text(s) to be anno-tated and the appropriate metric definition.
2.3 Selection of Annotators and Training Annotation is an intellectually demanding task. Three typical layers of annotation in MT de-velopment are:
<issues> <issue type="Accuracy" level="0" display="yes"> <issue type="Mistranslation" level="1" display="yes"> <issue type="Terminology" level="2" display="yes" /> </issue> <issue type="Omission" level="1" display="yes" /> <issue type="Addition" level="1" display="yes" /> <issue type="Untranslated" level="1" display="yes" /> </issue> <issue type="Fluency" level="0" display="yes"> <issue type="Content" level="1" display="no"> <issue type="Register" level="2" display="yes" /> </issue> <issue type="Mechanical" level="1" display="no"> <issue type="Spelling" level="1" display="yes" /> <issue type="Typography" level="1" display="yes" /> <issue type="Grammar" level="1" display="yes" /> </issue> <issue type="Unintelligible" level="1" display="yes" /> </issue> </issues>
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
7
1. The phenomenological level (target errors/issues) 2. The linguistic level (source or target POS, phrases, etc.) 3. The explanatory level (source/system-related causes for certain errors)
MQM annotation is targeting the phenomenological level. Depending on the complexity of the metric, it may require expert-level skill in both translation theory and linguistics. Within the QTLaunchPad project, it was found that expert human translators represented ideal an-notators. However, not all translators were equally capable. In general, those with formal training in linguistics or with previous experience in error annotation (e.g., using a company-specific error scorecard system) were the most prepared for MQM annotation. As inter-annotator agreement (IAA) did not exceed 50%, even with training, it is important in research environments to have multiple annotators in order to control for variability be-tween individuals. Based on experience in the QTLaunchPad project, it is recommended that three annotators be used, if possible. It is anticipated that IAA would increase with experi-ence and feedback, but in most research scenarios it is unlikely that annotators will work with MQM for an extended period. Training is vital since the task and specific details of how to work with MQM-compliant met-rics and tools are not immediately apparent, even to highly skilled individuals. In general, the following training steps and materials are required:
• A live demo of the annotation environment. This step is vital to ensure that an-notators understand how to use the tool and are aware of all relevant features. Since annotation tools can be relatively complex, this demo should focus on a step-by-step explanation of the relevant process. It is recommended that the demo be recorded, if possible, for future reference.
• A decision tree and written annotation guidelines. A decision tree provides a relatively objective tool that helps guide the annotator to selection of the right issue. Written guidelines help annotators determine correct behavior in cases where the ap-propriate action is not self-evident (e.g., which portion of a text to mark when word or-der is wrong and multiple portions could be moved to fix the problem). These tools are discussed in Section 5 (Training materials) below.
• A calibration set. In this phase annotators are asked to work with a set where the er-ror properties are well known to the researchers. The data in such a set could be “real” data or could be data with known errors introduced into it. Comparing the annotators’ results for the calibration set with the idea profile allows the researcher to identify any problems or confusions with the evaluation and provide corrective guidance before the research data is considered. Note that the calibration set should be representative of the data to be annotated, and it is highly recommended that it in fact be drawn from the same data set as the data to be annotated. (E.g., if 1000 segments out of 1500 are to be evaluated, 150 might be set aside for calibration with the 1000 used for the research question then taken from the remaining 1350.)
2.4 Evaluation The evaluation/annotation task may proceed after training is completed and the results of the calibration set are verified. Based on experience, it is recommended that the annotators work in short segments (perhaps 30 minutes) with frequent breaks. The amount that can be evaluated in a given time frame depends on the number of errors present in the text: “clean-er” texts are faster to evaluate and annotate than are “dirty” texts with many errors. For MT evaluation, there is often a significant portion of the text that has so many errors that annotation is counter-productive since the nature of the errors may not be clear or the entire
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
8
text may be unintelligible. Therefore it is recommended that the annotators conduct an ini-tial “triage” phase in which segments are quickly categorized into one of three categories:
• perfect segments (which do not need to be annotated), • segments to be annotated (the QTLaunchPad project targeted segments with 1–3 er-
rors), and • “garbage” segments which contain too many errors to be annotated.
Annotation can then focus on the second category without worrying about the other two cat-egories. If the triage task is conducted by more than one individual, appropriate policies for reconciling differences of opinion should be established (e.g., if one annotator marks a sen-tence as perfect and another as needing annotation, it is probably wise to circulate it for an-notation by all annotators).
2.5 Analysis Multiple types of analysis are possible. Aggregate figures are often useful if multiple MT sys-tems are being compared as they can reveal system-level differences across engines. For de-termining the causes of specific errors, detailed analysis of specific issues is required. What-ever analysis is intended, it is important that data be preserved at all stages of transfor-mation (e.g., if errors are extracted, the process should make a copy of the original data) since it is easy to make mistakes that can result in irretrievable data loss.
3 Costs Based on the QTLaunchPad tasks, which focused on “near miss” translations, the direct costs of annotation, including triage selection of data to annotate, were approximately €1.50/segment.2 With previously trained annotators, the amount would probably drop to €1.00–1.25/segment. However, costs for MQM-based analysis are highly variable. For text with few errors, annotation would be quite inexpensive. For text with many errors, annota-tion would be much more expensive. This variability is one of the reasons why a triage phase is strongly recommended since it allows the researcher to select segments with relatively predictable costs. The cost per issue in the QTLaunchPad tasks was approximately 0.75€. Since the number of issues will vary between tasks, cost per issue cannot predict costs, but gives an idea of the productivity of evaluators. Finally, from the QTLaunchPad tasks the cost per word of annotation comes out to around €0.07–0.09/word. Accordingly, for pre-selected items with relatively few errors, the cost per thousand words would be around €7–9. If multiple annotations are factored in, the costs are multiplied by the number of annotators. In order to obtain sound data then, the best estimate at present is that the cost is between €20 and €30 for 100o words (assuming triple annotation). These figures do not include management or analysis, which can easily add 100–200% on top of the direct costs.
2 These figures are based on a payment system that paid a flat fee for a certain amount of text. An hourly fee was not used in QTLaunchPad because there was no previous experience on which to estimate time. However, if a fee of €50/hour is used with trained annotators, the figures presented here would be 15–20% lower for the sorts of text evaluated in the project.
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
9
4 Amount of text required There is no firm guidance for the amount of material needed for annotation. Based on QTLaunchPad results, it is possible to detect trends and identify major issue types with as few as 100–150 segments. Identifying rarer phenomena would require more data since in-teresting phenomena would be expected to display a “long-tail” distribution, with certain kinds of errors (and causes) accounting for the bulk of problems, while other errors are less common. If the goal is just to identify high-level distribution, small data sets may suffice, but in the QTLaunchPad project, a concerted effort was made to identify the causes of problems, a task which required many more segments. As is typical, the more data one has the better.
5 Training materials The most useful training materials are annotation guidelines and decision trees. A set of an-notation guidelines and a decision tree initially developed in QTLaunchPad and updated for use in the QTLeap project are included at the end of this document. The following subsec-tions describe these resources and how to create them.
5.1 Decision trees Decision trees are useful tools for learning a specific MQM metric’s issue types and distin-guishing between them. They are especially useful as a learning tool and to aid in determin-ing which issue applies in cases where the answer is not immediately apparent. There are at least as many possible decision trees as there are MQM metrics (more, in fact, because deci-sion trees can present issues in multiple orders). This document provides some guidance for making decision trees. Decision trees should work through branches of the hierarchy, with a single question sepa-rating each branch from other branches. This requirement is important because all children of a particular issue type (an issue and its children constitute a branch) could be classified as the parent type, so a single question is needed that can distinguish all of them as a group from other issues. After determining which node an issue is contained within, it is important to resolve more specific issue types before more general ones. This guideline works on the principle of exclu-sion: by eliminating specific cases the general case is what remains. For example, if an MQM metric has the following structure for Accuracy:
• Accuracy • Mistranslation • Terminology • Company terminology
• Number • Omission • Omitted variable
The process to work through the hierarchy is as follows:
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
10
• Determine whether the issue is a type of Mistranslation or, if it is not, if it is a type of Omission. Since these are the specific types of Accuracy, they need to be eliminated before declaring the issue a general Accuracy issue.
• If it is one of the subtypes, questions must determine if the issue is one of their children (or grandchildren). For example, if an issue is a type of Mistranslation, the question “Was a number mistranslated?” would identify (or rule out) a mistranslation of a num-ber; the question “Is a term translated incorrectly?” would identify (or rule out) Ter-minology. If the answer to both of those questions is “No” then the issue is Mis-translation.
• A similar principle would ask the evaluator to rule out Company terminology before using a general Terminology issue type.
In accordance with the above, a decision tree for the Accuracy branch in this metric might look like the following:
• Is content present in the source inappropriately omitted from the target? [This ques-tion selects or excludes Omission] • Yes: Go to question 2 [We know it is a type of Omission] • No: Go to 3. [Omission has been excluded, so now we need to see if is another type of
Accuracy]
• Is a variable omitted from the target content? [Tells us if the specific subtype of Omission should be selected] • Yes: Omitted variable • No: Omission [We have excluded the subtype of Omission, leaving the general
type] • Are words or phrases translated incorrectly (i.e., is meaning conveyed by the source
changed in the target)? [This question selects or excludes Mistranslation] • Yes: Go to question 4 [The issue is a type of Mistranslation] • No: Accuracy [Both Omission and Mistranslation have now been excluded,
leaving only Accuracy] • Were numbers translated incorrectly? [Selects or excludes Number]
• Yes: Number • No: Go to question 5 [Number is excluded, so we move on]
• Is a domain- or organization-specific word or phrase translated incorrectly? [Selects or excludes Mistranslation] • Yes: Go to question 6 [We know it is a type of Terminology] • No: Mistranslation [We have excluded every other option]
• Is the word or phrase translated contrary to company-specific terminology guidelines? [Identifies or excludes Company terminology]
• Yes: Company terminology • No: Terminology [Company terminology has been excluded, leaving the more gen-
eral Terminology] Note that the order in which children of an element are selected is theoretically unimportant. For example, question 1 above could have served to select or exclude Terminology and question 2 could have focused on Omission. The important aspect is that subtypes are ex-cluded before selecting a general type. Although there is no theoretical principle for placing one issue type before another, however, there may be practical reasons to do so: decision trees should be optimized for efficiency. If it is expected that one issue type will be quite rare while its sibling would be more common, the more common sibling should be placed first to make it easier to find.
Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality
11
5.1.1 A generalized decision tree
The attached decision tree covers the full MQM hierarchy. It is not expected that the entire tree will be used, but individual questions can be taken from this decision tree to build spe-cific trees. (Note that, due to its complexity, this tree is optimally printed on A0 paper. An A4-sized version is included in this document for reference. The full-sized version is availa-ble at http://qt21.eu/downloads/fullDecisionTreeComplete.pdf.) Note that portions of the tree generally should not be used without their parent issue unless the decision tree is intended to document only specific errors and not general types. For ex-ample, selecting Company Terminology without its ancestor nodes Terminology, Mis-translation, and Accuracy would result in a tree that cannot identify more general error types. This approach might be appropriate if the only issue being assessed is adherence to company terminology guidelines. If a metric is created that identifies only specific subtypes (e.g., there is a metric that counts only terminology violations and distinguishes between company and normative terminology), a decision tree is still possible, but could not be made from this resource without modification. To extract a portion of the tree that is less granular than the full tree, it is necessary to re-move any unneeded children of the types to be assessed. Guidance to remove these issues is beyond the scope of this description, but is relatively straight forward if the MQM hierarchy is understood. Note that the specific questions may vary from those presented in the decision tree as long as they are capable of identifying the appropriate issues. The specific questions presented here are not to be treated as normative.
5.2 Annotation guidelines Annotation guidelines provide practical guidance for the annotator. They need to provide a definition of the metric, instructions for how to realize the metric in the chosen annotation environment, and any specific items that need special attention. The guidelines are used in training, but also for reference during annotation. Therefore they need to be short and acces-sible. It may be advisable to maintain the guidelines in an accessible format where changes can be made to address queries and concerns that arise during annotation. A sample set of annotation guidelines is included at the end of this document. (The provided guidelines were given to annotators working on the MQM corpora analyzed in D1.3.1.)
Guide to selecting MQM issues for the MT Evaluation Metric
version 1.4 (2014 November 17)
Selecting issues can be a complex task. In order to assist evaluators, a decision tree helps evaluators select appropri-ate issues. Use the decision tree not only for learning about MQM issues, but to guide your annotation efforts and resolve any questions or concerns you may have.
Start at the upper left corner of the decision tree and then answer the questions and follow the arrows to find appropriate issues.
If using translate5, note that the decision tree is organized a bit differently than the hierarchy in translate5 because it eliminates specific issue types before moving to general ones, so you familiarize yourself with how issues are organized in translate5 before beginning annotation.
Add notes in translate5 of the scorecard to explain any decisions that you feel need clarification, to ask ques-tions, or to provide information needed to understand issues, such as notes about what has been omitted in a translation.
In addition to using the decision tree, please understand and follow the guidelines in this document. Email us at info@qt21.eu if you have questions that the decision tree and other content in this document do not address.
1. What is an error?An error represents any issue you may find with the translated text that either does not correspond to the source or is considered incorrect in the target language. The list of language issues upon which you are to base your annota-tion is described in detail below and provides a range of examples.
The list is divided into two main issue categories, Accuracy and Fluency, each of which contains relevant, more detailed subcategories. Whenever possible, the correct subcategory should be chosen; however, if in doubt, please do not guess. Instead, select the category level about which you are most certain in order to avoid inconsis-tencies in the results.
Example: The German term Zoomfaktor was incorrectly translated as zoom shot factor, and you are unsure whether this represents a Mistranslation or an Addition. In this case, cat-egorize the error as an Accuracy error since it is unclear whether content has been added or a term mistranslated.
2. The Annotation ProcessThe translations you annotate should be a set of “near miss” (i.e., “almost perfect”) translations to annotate. Please follow these rules when selecting errors and tagging the respective text in the translations:
1. Use the examples in this documentation to understand specific classes.
2. If multiple types could be used to describe an issue (e.g., Agreement, Word form, Grammar, and Fluency), select the first one that the decision tree guides you to. The tree is organized along the following principles:
a. It prefers more specific types (e.g., Part of speech) to general ones (e.g., Grammar). However, if a specific type does not apply, it guides you to use the general type.
b. General types are used where the problem is of a general nature or where the specific problem does not have a precise type. For example He slept the baby exhibits what is technically known as a valency error, but because there is no specific type for this error available, it is assigned to Grammar.
3. Less is more. Only tag the relevant text. For example, if a single word is wrong in a phrase, tag only the single word rather than the entire phrase. If two words, separated by other words, constitute an error, mark only those two words separately. (See the section on “minimal markup” below.)
4. If correcting one error would take care of others, tag only that error. For example, if fixing an Agreement er-ror would fix other related issues that derive from it, tag only the Agreement error, not the errors that result from it.
MQM annotators guidelines (version 1.4, 2014-11-17) Page 2
Does
an un
need
ed
func
tion w
ord
appe
ar?
Is a n
eede
d fun
ction
wo
rd m
issing
?
Are “
func
tion
word
s” (p
repo
si-tio
ns, a
rticle
s, “h
elper
” ver
bs, e
tc.)
incor
rect?
Is an
inco
rrect
func
tion
word
used
?
Is th
e tex
t gar
bled o
r ot
herw
ise im
possi
ble to
un
derst
and?No
Flue
ncy
(gen
eral)
*Gr
amm
ar(g
ener
al)
Func
tion
wor
ds(g
ener
al)
Yes
Extr
aneo
usM
issi
ngIn
corr
ect
Unin
telli
gibl
e
No No No
Is th
e tex
t gra
mm
atica
lly
incor
rect? No No
Yes
NoNo
No
Accu
racy
(gen
eral)
*
NoNo
NoNo
Are w
ords
or ph
rase
s tra
nslat
ed in
appr
opri-
ately
?
Mis
tran
slat
ion
Yes
Are t
erm
s tra
nslat
ed
incor
rectl
y for
the d
o-m
ain or
cont
rary
to an
y te
rmino
logy r
esou
rces?
Term
inol
ogy
Yes
Is th
ere t
ext i
n the
so
urce
lang
uage
that
sh
ould
have
been
tra
nslat
ed?
Untr
ansl
ated
Yes
Is so
urce
cont
ent
inapp
ropr
iately
omitt
ed
from
the t
arge
t?
Omis
sionYe
s
Yes
Yes
Yes
Yes
Has u
nnee
ded c
onte
nt
been
adde
d to t
he
targ
et te
xt?
Addi
tion
Is ty
pogr
aphy
, oth
er th
an
miss
pellin
g or c
apita
liza-
tion,
used
inco
rrectl
y?
Are o
ne or
mor
e wor
ds
miss
pelle
d/ca
pitali
zed
incor
rectl
y?
Typo
grap
hy
Spel
ling
No No
Yes
Yes
Yes
Do w
ords
appe
ar in
th
e wro
ng or
der?
Wor
d or
der
Yes
Yes
Accu
racy
Flue
ncy
Gra
mm
ar
Is th
e wro
ng fo
rm
of a
word
used
?Is
the p
art o
f spe
ech
incor
rect?
Do tw
o or m
ore w
ords
no
t agr
ee fo
r per
son,
nu
mbe
r, or g
ende
r?
Is a w
rong
verb
form
or
tens
e use
d?W
ord
form
(gen
eral)
Part
of s
peec
h
Yes
NoNo
No
Yes
Tens
e/m
ood/
aspe
ct
Yes
Agre
emen
t
Yes
Wor
d fo
rm
Func
tion
wor
ds
Note:
For a
ny qu
estio
n, if t
he an
swer
is un
clear,
selec
t “No
”
Is th
e issu
e rela
ted t
o the
fact
that
the t
ext i
s a tr
ansla
tion
(e.g.
, the
targ
et te
xt do
es no
t m
ean w
hat t
he so
urce
text
do
es)?
No
* Ple
ase
desc
ribe
any
Flue
ncy
(gen
eral
) or A
ccur
acy
(gen
eral
) iss
ues u
sing
the
Not
es fe
atur
e.
MQ
M A
nn
ota
tio
n D
ecis
ion
Tr
ee
MQM annotators guidelines (version 1.4, 2014-11-17) Page 3
ExamplesSource: Importfilter werden geladenTranslation: Import filter are being loadedCorrect: Import filters are being loaded
In this example, the only error is the translation of filter in the singular rather than the plural (as made clear by the verb form in the source text). This case should be classified as Mistranslation, even though it shows prob-lems with agreement: if the subject had been translated properly the agreement problem would be resolved. In this case only filter should be tagged as a Mistranslation.
Source: im Dialog ExportierenTranslation: in the dialog exportCorrect: in the Export dialog
In this example, only Mistranslation should be marked. While Word order and Spelling (capitalization) would be considered errors in other contexts, this would not be the case here, as these two words constitute one term that has been incorrectly translated.
5. If one word contains two errors (e.g., it has a Spelling issue and is also an Extraneous function word), enter both errors separately and mark the respective word in both cases.
6. If in doubt, choose a more general category. The categories Accuracy and Fluency can be used if the nature of an error is unclear. In such cases, providing notes to explain the problem will assist the QTLaunchPad team in its research.
3. Tricky casesThe following examples are ones that have been encountered in practice and that we wish to clarify.
• Function words: In some cases issues related to function words break the accuracy/fluency division seen in the decision tree because they are listed under Fluency even though they may impact meaning. Despite this issue, please categorize them as the appropriate class under Function words.
Example: The ejector may be found with the external case (should be on in this case). Even though this error changes the meaning, it should be classified as Function words: incorrect in the Fluency branch.
• Word order: Word order problems often affect long spans of text. When encountering word orders, mark the smallest possible portion that could be moved to correct the problem.
Example: He has the man with the telescope seen. Here only seen should be marked as moving this one word would fix the problem.
• Hyphenation: Hyphenation issues sometimes occur in untranslated content and should be classified as such. Otherwise they should be classified as Spelling.
Example: Load the XML-files (Spelling) Nützen Sie die macro-lens (Untranslated, if the source has macro-lens as well)
• Number (plural vs. singular) is a Mistranslation.
• Terminology: Inappropriate use of terms as distinct from general-language Mistranslation.
Example: An English translation uses the term thumb drive to translate the German USB Speicherkarte. This translation is intelligible, but if the translation mandated in specifi-cations or a relevant termbase is USB memory stick, the use of thumb drive constitutes a Terminology error, even if thumb drive would be acceptable in everyday usage. How-ever, if USB Speicherkarte were to be translated as USB Menu, this would be a Mistrans-
MQM annotators guidelines (version 1.4, 2014-11-17) Page 4
lation since the words would be translated incorrectly, regardless of whether the origi-nal phrase is a term.
NOTE: Because no terminology list is provided, please use your understanding of relevant IT terminology for the evaluation task.
• Unintelligible: Use Unintelligible if content cannot be understood and the reason cannot be analyzed according to the decision tree. This category is used as a last resort for text where the nature of the problem is not clear at all.
Example: In the sentence “You can also you can use this tab to precision, with the colours are described as well as the PostScript Level,” there are enough errors that the meaning is unclear and the precise nature of the errors that lead to its unintelligibility cannot be easily determined.
• Agreement: This category generally refers to agreement between subject and predicate or gender and case.
Examples: The boy was playing with her own train I is at work
• Untranslated: Many words may look as if they have been translated and simply forgotten to apply proper capitalization or hyphenations rules. In most, cases, this would represent an untranslated term and not a Spelling. If the target word or phrase is identical to the source word or phrase, it should be treated as Untranslated, even if a Spelling error could also account for the problem.
4. Minimal markupIt is vital in creating error markup that errors be marked up with the shortest possible spans. Markup must identify only that area needed to specify the problem. In some cases this requirement means that two separate spans must be identified.
The following examples help clarify the general principles:
Incorrect markup Problem Correct minimal markupDouble click on the number faded in the status bar.[Mistranslation]
Only the single word faded is prob-lematic, but the markup indicates that number faded in is incorrect.
Double click on the number faded in the status bar.
The standard font size for dialogs is 12pt, which corresponds to a stan-dard of 100%. [Mistranslation]
Only the term Maßstab has been translated incorrectly. The larger span indicates that text that is perfectly fine has a problem.
The standard font size for dialogs is 12pt, which corre-sponds to a standard of 100%.
The in 1938 nascent leader with flair divined %temp_name eating lonely. [Unintelligible]
The entire sentence is Unintelligible and should be marked as such.
The in 1938 nascent leader with flair divined %temp_name eating lonely.
As noted above, Word order can be problematic because it is often unclear what portion(s) of the text should be marked. In cases of word order, mark the shortest portion of text (in number of words) that could be moved to fix the problem. If two portions of the text could resolve the problem and are equal in length, mark the one that occurs first in the text. The following examples provide guidance:
Incorrect markup Problem Correct minimal markupThe telescope big observed the op-eration
Moving the word telescope would solve the problem and only this word should be marked (since it occurs first in the text).
The telescope big observed the operation
MQM annotators guidelines (version 1.4, 2014-11-17) Page 5
The eruption by many instruments was recorded.
Although this entire portion shows word order problems, moving was recorded would resolve the problem (and is the shortest span that would resolve the problem).
The eruption by many instru-ments was recorded.
The given policy in the manual user states that this action voids the warranty.
This example actually has two separate issues that should be marked separately.
The given policy in the manual user states that this action voids the warranty.
Agreement poses special challenges because portions that disagree may be widely separated. To select appropriate minimal spans, consider the following guidelines:
• If two items disagree and it is readily apparent which should be fixed, mark only the portion that needs to be fixed. E.g., in “The man and its companion were business partners” it is readily apparent that its should be his and the wrong grammatical gender has been used, so only its should be marked.
• If two items disagree and it is not clear which portion is incorrect, mark the both items and mark them for Agreement, as shown in the example in the table below.
The following examples demonstrate how to mark Agreement:
Incorrect markup Problem Correct minimal markupThe man and its companion were business partners. [Agreement]
In this example, it is clear that its is the problematic portion, and that man is correct, so only its should be marked.
The man and its companion were business partners.
The man whom they saw on Friday night at the store were very big. [Agreement]
In this example it is not clear whether man or were is the error since there is nothing to indicate whether singular or plural is intended. Here the highlighted portion identifies only a single word, insufficient to identify the agreement problem. The correct version highlights both words as separate issues. In such cases use the Notes field to explain the decision.
The man whom they saw on Friday night at the store were very big. [Agreement]
In the event of questions about the scope of markup that should be used, utilize the Notes field to make a query or explain your choice.
MQM annotators guidelines (version 1.4, 2014-11-17) Page 6
A. Issue categories
The error corpus uses the following issue categories:
• Accuracy. Accuracy addresses the extent to which the target text accurately renders the meaning of the source text. For example, if a translated text tells the user to push a button when the source tell the user not to push it, there is an accuracy issue.
• Mistranslation. The target content does not accurately represent the source content.
Example: A source text states that a medicine should not be administered in doses great-er than 200 mg, but the translation states that it should not be administered in doses less than 200 mg.
Note(s): Mistranslation can be used for both words and phrases.
• Terminology. Domain- or industry-specific terms (including multi-word terms) are trans-lated incorrectly.
Example: In a musicological text the term dog is encountered and translated into German as Hund rather than the domain-specific term Schnarre.
Note(s): Terminology errors may be valid translations for the source word in gen-eral language, but are incorrect for the specific domain or organization.
• Omission. Content is missing from the translation that is present in the source.
Example: A source text refers to a “mouse pointer” but the translation does not mention it.Note(s): Omission should be reserved for those cases where content present in the
source and essential to its meaning is not found in the target text.
• Addition. The target text includes text not present in the source.
Example: A translation includes portions of another translation that were inadvertently pasted into the document.
• Untranslated. Content that should have been translated has been left untranslated.
Example: A sentence in a Japanese document translated into English is left in Japanese.
Note(s): As noted above, if a term is passed through untranslated, it should be classified as Untranslated rather than as Mistranslation.
• Fluency. Fluency relates to the monolingual qualities of the source or target text, relative to agreed-upon specifications, but independent of relationship between source and target. In other words, fluency issues can be assessed without regard to whether the text is a translation or not. For example, a spelling error or a problem with register remain issues regardless of whether the text is translated or not.
• Spelling. Issues related to spelling of words (including capitalization)
Examples: The German word Zustellung is spelled Zustetlugn. The name John Smith is written as “john smith”.
• Typography. Issues related to the mechanical presentation of text. This category should be used for any typographical errors other than spelling.
Examples: Extra, unneeded carriage returns are present in a text. A semicolon is used in place of a comma.
• Grammar. Issues related to the grammar or syntax of the text, other than spelling and orthography.
Example: An English text reads “The man was in seeing the his wife.”Note(s): Use Grammar only if no subtype accurately describes the issue.
MQM annotators guidelines (version 1.4, 2014-11-17) Page 7
• Word form. The wrong form of a word is used. Subtypes should be used when possible.
Example: An English text has comed instead of came.
• Part of speech. A word is the wrong part of speech
Example: A text reads “Read these instructions careful” instead of “Read these instructions carefully.”
• Agreement. Two or more words do not agree with respect to case, number, person, or other grammatical features
Example: A text reads “They was expecting a report.”
• Tense/aspect/mood. A verbal form inappropriate for the context is used
Example: An English text reads “Yesterday he sees his friend” instead of “Yes-terday he saw his friend”; an English text reads “The button must be pressing” instead of “The button must be pressed”.
• Word order. The word order is incorrect
Example: A German text reads “Er hat gesehen den Mann” instead of “Er hat den Mann gesehen.”
• Function words. Linguistic function words such as prepositions, particles, and pronouns are used incorrectly
Example: An English text reads “He beat him around” instead of “he beat him up.”
Note(s): Function words is used for cases where individual words with a gram-matical function are used incorrectly. The most common problems will have to do with prepositions, and particles. For languages where verbal prefixes play a significant role in meaning (as in German), they should be included here, even if they are not independent words.
There are three subtypes of Function words. These are used to indicate whether an unneeded function word is present (Extraneous), a needed function word is missing (Missing), or a incorrect function word is used (Incorrect). Evaluators should use the note field to specify details for missing function words.
• Unintelligible. The exact nature of the error cannot be determined. Indicates a major break down in fluency.
Example: The following text appears in an English translation of a German automotive manual: “The brake from whe this કુતારો િસ S149235 part numbr,,."
Note(s): Use this category sparingly for cases where further analysis is too uncertain to be useful. If an issue is categorized as Unintelligible no further categorization is required. Unintelligible can refer to texts where a significant number of issues combine to create a text for which no further determination of error type can be made or where the relationship of target to source is entirely unclear.
Star
t h
ere
→
Ver
ity
Inte
rnat
ion
-al
izat
ion
Des
ign
Flu
ency
Subt
ypes
of I
nter
-na
tiona
lizat
ion
are
curr
ently
und
efine
d.
Co
mp
atab
ilit
y (d
epre
cate
d) –
Thes
e issu
es ar
e inc
luded
prim
arily
for c
ompa
tabil
ity w
ith th
e LISA
QA M
odel
Appl
icat
ion
com
patib
ility
• B
ill o
f mat
eria
ls/ru
nlist
• Bo
ok-b
uild
ing
sequ
ence
• C
over
s • D
eadl
ine
• Del
iver
y • D
oes n
ot a
dher
e to
spec
ifica
tions
• Em
bedd
ed te
xt •
File
form
at •
Func
tiona
l • O
utpu
t dev
ice
• Prin
ting
• Rel
ease
gui
de •
Spin
es •
Styl
e, pu
blish
ing
stan
dard
s • T
erm
inol
ogy,
cont
extu
ally
inap
prop
riate
To L
earn
Mor
e:
A1
Has
con
tent
pre
sent
in th
e so
urce
be
en in
appr
opria
tely
om
itted
from
th
e ta
rget
?
Yes
Go
to A
2No
Go
to A
3
A2
Is a
var
iabl
e om
itted
from
the
targ
et
cont
ent?
Yes
Omit
ted
varia
ble
NoOm
issi
on
A3
Has
con
tent
not
pre
sent
in th
e so
urce
bee
n in
appr
opria
tely
add
ed
to th
e so
urce
?
Yes
Addi
tion
NoG
o to
A4
A4
Has
con
tent
bee
n le
ft in
the
sour
ce
lang
uage
that
sho
uld
have
bee
n tr
ansl
ated
?
Yes
Go
to A
5No
Go
to A
6
A5
Is th
e un
tran
slat
ed c
onte
nt in
a
grap
hic?
Yes
Untr
ansl
ated
gra
phic
NoUn
tran
slat
ed
A6
Are
wor
ds o
r phr
ases
tran
slat
ed
inco
rrec
tly?
Yes
Go
to A
7No
Accu
racy
(gen
eral)
A7
Is a
dom
ain-
or o
rgan
izat
ion-
spec
ific
wor
d or
phr
ase
tran
slat
ed
inco
rrec
tly?
Yes
Go
to A
8No
Go
to A
10
A8
Is th
e w
ord
or p
hras
e tr
ansl
ated
co
ntra
ry to
com
pany
-spe
cific
te
rmin
olog
y gu
idel
ines
?
Yes
Com
pany
term
inol
ogy
NoG
o to
A9
A9
Is th
e w
ord
or p
hras
e tr
ansl
ated
co
ntra
ry to
gui
delin
es e
stab
lishe
d in
a
norm
ativ
e do
cum
ent (
e.g.
, law
or
stan
dard
)?
Yes
Norm
ativ
e te
rmin
olog
yNo
Term
inol
ogy
A10
Is th
e tr
ansl
atio
n ov
erly
lite
ral?
Yes
Over
ly li
tera
lNo
Go
to A
11
A11
Is th
e tr
ansl
ated
con
tent
a “f
alse
fr
iend
” (fa
ux a
mi)?
Yes
Fals
e fr
iend
NoG
o to
A12
A12
Is a
nam
ed e
ntity
(suc
h as
the
nam
e of
a p
erso
n, p
lace
, or o
rgan
izat
ion)
tr
ansl
ated
inco
rrec
tly?
Yes
Enti
tyNo
Go
to A
13
A13
Was
con
tent
tran
slat
ed th
at s
houl
d no
t hav
e be
en tr
ansl
ated
?
Yes
Shou
ld n
ot h
ave
been
tran
slat
edNo
A14
A14
Was
a d
ate
or ti
me
tran
slat
ed
inco
rrec
tly?
Yes
Date
/tim
eNo
A15
A15
Wer
e un
its (e
.g.,
for m
easu
rem
ent o
r cu
rren
cy) t
rans
late
d in
corr
ectly
?
Yes
Unit
conv
ersi
onNo
A16
A16
Wer
e nu
mbe
rs tr
ansl
ated
in
corr
ectly
?
Yes
Num
ber
NoA
17
A17
Is th
e tr
ansl
atio
n in
impr
oper
exa
ct
mat
ch fr
om tr
ansl
atio
n m
emor
y?
Yes
Impr
oper
exa
ct m
atch
NoM
istr
ansl
atio
n
F1
Is th
e co
nten
t writ
ten
at a
leve
l of
form
ality
inap
prop
riate
for t
he
subj
ect m
atte
r, au
dien
ce, o
r tex
t ty
pe?
Yes
Go
to F
2No
Go
to F
3
F2
Doe
s th
e co
nten
t use
sla
ng o
r oth
er
unsu
itabl
e w
ord
varia
nts?
Yes
Varia
nts/
slan
gNo
Regi
ster
F3
Is th
e co
nten
t sty
listic
ally
in
appr
opria
te?
Yes
Styl
isti
csNo
Go
to F
4
F4
Is th
e co
nten
t inc
onsi
sten
t with
its
elf?
Yes
Go
to F
5No
Go
to F
10
F5
Are
abb
revi
atio
ns u
sed
inco
nsis
tent
ly?
Yes
Abbr
evia
tion
sNo
Go
to F
6
F6
Is te
xt in
cons
iste
nt w
ith g
raph
ics?
Yes
Imag
e vs
. tex
tNo
Go
to F
7
F7
Is th
e di
scou
rse
stru
ctur
e of
the
cont
ent i
ncon
sist
ent?
Yes
Disc
ours
eNo
Go
to F
8
F8
Is te
rmin
olog
y in
cons
iste
nt w
ithin
th
e co
nten
t (w
ithou
t bei
ng a
mis
-tr
ansl
atio
n)?
Yes
Term
inol
ogic
al in
cons
iste
ncy
NoG
o to
F9
F9
Are
cro
ss-r
efer
ence
s or
link
s in
cons
iste
nt in
wha
t the
y po
int t
o?
Yes
Inco
nsis
tent
link
/cro
ss-re
fere
nce
NoIn
cons
iste
ncy
F10
Doe
s th
e co
nten
t use
uni
diom
atic
ex
pres
sion
s?
Yes
Unid
iom
atic
NoG
o to
F11
F11
Is c
onte
nt in
appr
opria
tely
du
plic
ated
?
Yes
Dupl
icat
ion
NoG
o to
F12
F12
Is th
e w
rong
term
use
d? (G
ener
ally
as
sess
ed fo
r sou
rce
text
onl
y)
Yes
Go
to F
13No
Go
to F
14
F13
Is th
e te
rm u
sed
cont
rary
to
guid
elin
es e
stab
lishe
d in
a
norm
ativ
e do
cum
ent (
e.g.
, law
or
stan
dard
)?
Yes
Mon
olin
gual
nor
mat
ive
term
inol
ogy
NoM
onol
ingu
al te
rmin
olog
y
F14
Is th
e co
nten
t am
bigu
ous?
Yes
Go
to F
15No
Go
to F
16
F15
Is a
pro
noun
or o
ther
ling
uist
ical
ly
refe
rent
ial s
truc
ture
unc
lear
as
to it
s re
fere
nce/
ante
cede
nt?
Yes
Uncl
ear r
efer
ence
NoAm
bigu
ity
F16
Is c
onte
nt s
pelle
d in
corr
ectly
(in
clud
ing
inco
rrec
t cap
italiz
atio
n)?
Yes
Go
to F
17No
Go
to F
19
F17
Is c
onte
nt c
apita
lized
inco
rrec
tly?
Yes
Capi
taliz
atio
nNo
Go
to F
18
F18
Are
dia
criti
cs (e
.g.,
¨, ´,
˝, ˜)
mis
sing
or
inco
rrec
t?
Yes
Diac
ritic
sNo
Spel
ling
F19
Doe
s th
e co
nten
t vio
late
a fo
rmal
st
yle
guid
e (e
.g.,
Chic
ago
Man
ual o
f St
yle
or o
rgan
izat
ion
styl
e gu
ide)
?
Yes
Go
to F
20No
Go
to F
22
F20
Is th
e vi
olat
ion
spec
ific
to a
co
mpa
ny/o
rgan
izat
ion’
s in
tern
al/
hous
e st
yle
guid
e?
Yes
Com
pany
styl
eNo
Go
to F
21
F21
Is th
e vi
olat
ion
of a
third
-par
ty
styl
e gu
ide
(e.g
. Chi
cago
Man
ual
of S
tyle
, Am
eric
an P
sych
olog
ical
A
ssoc
iatio
n)?
Yes
3rd-
part
y sty
leNo
Styl
e gu
ide
F22
Doe
s th
e co
nten
t dis
play
pro
blem
s w
ith ty
pogr
aphy
(spa
cing
or
punc
tuat
ion)
Yes
Com
pany
styl
eNo
Go
to F
26
F23
Are
quo
te m
arks
or b
rack
ets
unpa
ired
(i.e.
, one
of a
pai
red
set o
f pu
nctu
atio
n is
mis
sing
)?
Yes
Unpa
ired
quot
e m
arks
or b
rack
ets
NoG
o to
F24
F24
Is p
unct
uatio
n us
ed in
corr
ectly
?
Yes
Punc
tuat
ion
NoG
o to
F25
F25
Is w
hite
spac
e us
ed in
corr
ectly
(i.e
., m
issi
ng, e
xtra
, inc
onsi
sten
t)?
Yes
Whi
tesp
ace
NoTy
pogr
aphy
F26
Is th
e co
nten
t gra
mm
atic
ally
in
corr
ect?
Yes
Go
to F
27No
Go
to F
33
F27
Is a
n in
corr
ect f
orm
of a
wor
d us
ed?
Yes
Go
to F
28No
Go
to F
31
F28
Is th
e w
rong
par
t of s
peec
h us
ed?
Yes
Part
of s
peec
hNo
Go
to F
29
F29
Doe
s th
e co
nten
t sho
w p
robl
ems
with
agr
eem
ent (
num
ber,
gend
er,
case
, etc
.)?
Yes
Agre
emen
tNo
Go
to F
30
F30
Doe
s th
e co
nten
t use
an
inco
rrec
t ve
rbal
tens
e, m
ood,
or a
spec
t?
Yes
Tens
e/m
ood/
aspe
ctNo
Wor
d fo
rm
F31
Are
wor
ds in
the
wro
ng o
rder
?
Yes
Wor
d or
der
NoG
o to
F32
F32
Are
func
tions
wor
ds (s
uch
as a
rtic
les,
“hel
per v
erbs
”, or
pre
posi
tions
) use
d in
corr
ectly
?
Yes
Func
tion
wor
dsNo
Gram
mar
F33
Doe
s th
e co
nten
t vio
late
loca
le-
spec
ific
conv
entio
ns (i
.e.,
it is
fine
for
the
lang
uage
, but
not
for t
he ta
rget
lo
cale
)?
Yes
Go
to F
34No
Go
to F
40
F34
Are
dat
es s
how
n in
the
wro
ng
form
at fo
r the
targ
et lo
cale
(e.g
., D
-M-Y
whe
n Y-
M-D
is e
xpec
ted)
?
Yes
Date
form
atNo
Go
to F
35
F35
Are
tim
es in
the
wro
ng fo
rmat
for
the
targ
et lo
cale
(e.g
., A
M/P
M w
hen
24-h
our t
ime
is e
xpec
ted)
?
Yes
Tim
e fo
rmat
NoG
o to
F36
F36
Are
mea
sure
men
ts in
the
wro
ng
form
at fo
r the
targ
et lo
cale
(e.g
., m
etric
uni
ts u
sed
whe
n Im
peria
l are
ex
pect
ed)?
Yes
Mea
sure
men
t for
mat
NoG
o to
F37
F37
Are
num
bers
form
atte
d in
corr
ectly
fo
r the
targ
et lo
cale
(e.g
., co
mm
a us
ed a
s th
ousa
nds
sepa
rato
r whe
n a
dot i
s ex
pect
ed)?
Yes
Num
ber f
orm
atNo
Go
to F
38
F38
Doe
s th
e co
nten
t use
the
wro
ng
type
of q
uote
mar
k fo
r the
targ
et
loca
le (e
.g.,
sing
le q
uote
s w
hen
doub
le q
uote
s ar
e ex
pect
ed)?
Yes
Quot
e m
ark
type
NoG
o to
F39
F39
Doe
s th
e co
nten
t vio
late
any
re
leva
nt n
atio
nal l
angu
age
stan
dard
s (e
.g.,
usin
g di
sallo
wed
w
ords
from
ano
ther
loca
le)?
Yes
Nati
onal
lang
uage
stan
dard
NoLo
cale
conv
enti
on
F40
Doe
s th
e co
nten
t use
an
inco
rrec
t ch
arac
ter e
ncod
ing?
Yes
Char
acte
r enc
odin
gNo
Go
to F
41
F41
Doe
s th
e co
nten
t use
cha
ract
ers
that
are
not
allo
wed
acc
ordi
ng to
sp
ecifi
catio
ns?
Yes
Nona
llow
ed ch
arac
ters
NoG
o to
F42
F42
Doe
s th
e co
nten
t vio
late
a fo
rmal
pa
tter
n (e
.g.,
regu
lar e
xpre
ssio
n)
that
defi
nes
wha
t the
con
tent
may
co
ntai
n?
Yes
Patt
ern
prob
lem
NoG
o to
F43
F43
Is c
onte
nt s
orte
d in
corr
ectly
for t
he
targ
et lo
cale
and
sor
ting
type
?
Yes
Sort
ing
NoG
o to
F44
F44
Is th
e co
nten
t inc
onsi
sten
t with
a
corp
us o
f kno
wn-
good
con
tent
? (N
ote:
Alm
ost a
lway
s de
term
ined
by
a co
mpu
ter p
rogr
am.)
Yes
Corp
us co
nfor
man
ceNo
Go
to F
45
F45
Are
link
s or
cro
ss-r
efer
ence
s br
oken
or
inac
cura
te?
Yes
Go
to F
46No
Go
to F
47
F46
Are
inte
rnal
link
s or
cro
ss-r
efer
ence
s br
oken
or i
nacc
urat
e?
Yes
Docu
men
t-in
tern
alNo
Docu
men
t-ex
tern
al
F47
Are
ther
e pr
oble
ms
with
an
inde
x or
Ta
ble
of C
onte
nt (T
oC)?
Yes
Go
to F
48No
Go
to F
51
F48
Are
pag
e re
fere
nces
in a
n in
dex
or
Tabl
e of
Con
tent
(ToC
) inc
orre
ct?
Yes
Page
refe
renc
esNo
Go
to F
49
F49
Is th
e fo
rmat
of a
n in
dex
or T
able
of
Cont
ent (
ToC)
inco
rrec
t?
Yes
Inde
x/TO
C for
mat
NoG
o to
F50
F50
Are
item
s m
issi
ng fr
om a
n in
dex
or
Tabl
e of
Con
tent
(ToC
)?
Yes
Mis
sing
/inco
rrec
t ite
mNo
Inde
x/TO
C
F51
Is c
onte
nt u
nint
ellig
ible
(i.e
., th
e flu
ency
is b
ad e
noug
h th
at th
e na
ture
of t
he p
robl
em c
anno
t be
dete
rmin
ed)?
Yes
Unin
telli
gibl
eNo
Flue
ncy
V1
Is th
e co
nten
t uns
uita
ble
for t
he
end-
user
(tar
get a
udie
nce)
?
Yes
End-
user
suit
abili
tyNo
Go
to V
2
V2
Is th
e co
nten
t inc
ompl
ete
or m
issi
ng
need
ed in
form
atio
n?
Yes
Go
to V
3No
Go
to V
5
V3
Are
list
s w
ithin
the
cont
ent
inco
mpl
ete
or m
issi
ng n
eede
d in
form
atio
n?
Yes
List
sNo
Go
to V
4
V4
Are
pro
cedu
res
desc
ribed
with
in
the
cont
ent i
ncom
plet
e or
mis
sing
ne
eded
info
rmat
ion?
Yes
Proc
edur
esNo
Com
plet
enes
s V5
Doe
s th
e co
nten
t vio
late
any
lega
l re
quire
men
ts fo
r the
targ
et lo
cale
or
inte
nded
aud
ienc
e?
Yes
Lega
l req
uire
men
tsNo
Go
to V
6
V6
Doe
s th
e co
nten
t ina
ppro
pria
tely
in
clud
e in
form
atio
n th
at d
oes
appl
y no
t to
the
targ
et lo
cale
or t
hat i
s ot
herw
ise
inac
cura
te fo
r it?
Yes
Loca
le-s
peci
fic co
nten
tNo
Verit
y
D1
Doe
s th
e fo
rmat
ting
issu
e ap
ply
glob
ally
to th
e en
tire
docu
men
t?
Yes
Go
to D
2No
Go
to D
8
D2
Are
col
ors
used
inco
rrec
tly?
Yes
Colo
rNo
Go
to D
3
D3
Is th
e ov
eral
l fon
t cho
ice
inco
rrec
t
Yes
Glob
al fo
nt ch
oice
NoG
o to
D4
D4
Are
foot
note
s/en
dnot
es fo
rmat
ted
inco
rrec
tly?
Yes
Foot
note
/end
note
form
atNo
Go
to D
5
D5
Are
mar
gins
for t
he d
ocum
ent
inco
rrec
t?
Yes
Mar
gins
NoG
o to
D6
D6
Are
wid
ows/
orph
ans
pres
ent i
n th
e co
nten
t?
Yes
Wid
ows/
orph
ans
NoG
o to
D7
D7
Are
ther
e im
prop
er p
age
brea
ks?
Yes
Page
bre
akNo
Over
all d
esig
n (la
yout
)
D8
Is lo
cal f
orm
attin
g (w
ithin
con
tent
) in
corr
ect?
Yes
Go
to D
9No
Go
to D
17
D9
Is te
xt a
ligne
d in
corr
ectly
?
Yes
Text
alig
nmen
tNo
Go
to D
10
D10
Are
par
agra
phs
inde
nted
impr
oper
ly
or n
ot in
dent
ed w
hen
they
sho
uld
be?
Yes
Para
grap
h in
dent
atio
nNo
Go
to D
11
D11
Are
font
s us
ed in
corr
ectly
with
in
cont
ent (
rath
er th
an g
loba
lly)?
Yes
Go
to D
12No
Go
to D
15
D12
Are
bol
d or
ital
ic u
sed
inco
rrec
tly?
Yes
Bold
/ital
icNo
Go
to D
13
D13
Is a
wro
ng fo
nt s
ize
used
?
Yes
Wro
ng si
zeNo
Go
to D
14
D14
Are
sin
gle-
wid
th fo
nts
used
whe
n do
uble
-wid
th fo
nts
shou
ld b
e us
ed
(or v
ice
vers
a)?
(App
lies t
o CJ
K te
xt o
nly.
)
Yes
Sing
le/d
oubl
e-w
idth
NoFo
nt
D15
Is te
xt k
erni
ng (s
pace
bet
wee
n le
tter
s) in
corr
ect (
text
too
tight
/too
lo
ose)
?
Yes
Kern
ing
NoG
o to
D16
D16
Is th
e le
adin
g (li
ne s
paci
ng o
f tex
t)
inco
rrec
t (e.
g., d
oubl
e sp
acin
g w
hen
sing
le s
paci
ng is
exp
ecte
d)?
Yes
Lead
ing
NoLo
cal f
orm
atti
ng
D17
Is tr
ansl
ated
text
mis
sing
from
the
layo
ut (i
.e.,
it ha
s be
en tr
ansl
ated
bu
t is
not v
isib
le in
the
form
atte
d ve
rsio
n)?
Yes
Mis
sing
text
NoG
o to
D18
D18
Is m
arku
p (e
.g.,
form
attin
g co
des)
us
ed in
corr
ectly
or i
n a
tech
nica
lly
inva
lid fa
shio
n?
Yes
Go
to D
19No
Go
to D
24
D19
Is m
arku
p us
ed in
cons
iste
ntly
(e.g
., <i
> is
use
d in
som
e pl
aces
and
<e
m>
in o
ther
s)?
Yes
Inco
nsis
tent
mar
kup
NoG
o to
D20
D20
Doe
s m
arku
p ap
pear
in th
e w
rong
pl
ace
with
in c
onte
nt?
Yes
Mis
plac
ed m
arku
pNo
Go
to D
21
D21
Has
mar
kup
been
inap
prop
riate
ly
adde
d to
the
cont
ent?
Yes
Adde
d m
arku
pNo
Go
to D
22
D22
Is n
eede
d m
arku
p m
issi
ng fr
om th
e co
nten
t?
Yes
Mis
sing
mar
kup
NoG
o to
D23
D23
Doe
s m
arku
p ap
pear
to b
e in
corr
ect?
(Not
e: G
ener
ally
det
ecte
d by
com
pute
r pro
cess
es)
Yes
Mis
sing
mar
kup
NoM
arku
p
D24
Are
ther
e pr
oble
ms
with
gra
phic
an
d/or
tabl
es?
Yes
Go
to D
25No
Go
to D
28
D25
Are
gra
phic
s or
tabl
es p
ositi
oned
in
corr
ectly
on
the
page
or w
ith
resp
ect t
o su
rrou
ndin
g te
xt?
Yes
Posi
tion
NoG
o to
D26
D26
Are
gra
phic
s or
tabl
es m
issi
ng fr
om
the
text
?
Yes
Mis
sing
gra
phic
/tab
leNo
Go
to D
27
D27
Are
ther
e pr
oble
ms
with
cal
l-out
s or
ca
ptio
ns fo
r gra
phic
s or
tabl
es?
Yes
call-
outs
and
capt
ions
NoGr
aphi
cs a
nd ta
bles
D28
Are
por
tions
of t
ext i
nvis
ible
due
to
text
exp
ansi
on?
Yes
Trun
cati
on/t
ext e
xpan
sion
NoG
o to
D29
D29
Is te
xt lo
nger
than
is a
llow
ed (b
ut
rem
ains
vis
ible
)?
Yes
Trun
cati
on/t
ext e
xpan
sion
NoLe
ngth
Mul
tidim
ensio
nal Q
ualit
y Met
rics (
MQ
M):
Full
Dec
ision
Tre
e
http
://w
ww.
qt21
.eu
MQ
M d
efini
tion:
http
://qt
21.e
u/m
qm-d
efini
tion/
The M
ultid
imen
siona
l Qua
lity M
etric
s (M
QM
) Fra
mew
ork
prov
ides
a hi
erar
chic
al ca
tego
rizat
ion
of er
ror t
ypes
that
occ
ur in
tran
slate
d or
loca
lized
pro
duct
s. Ba
sed
on a
deta
iled
anal
ysis
of ex
istin
g tra
nsla
tion
qual
ity m
etric
s, it
prov
ides
a fle
xibl
e typ
olog
y of i
ssue t
ypes
that
can
be ap
plie
d to
anal
ytic
or h
olist
ic tr
ansla
tion
qual
ity ev
alua
tion
task
s. A
lthou
gh th
e ful
l MQ
M is
sue t
ree (
whi
ch, a
s of N
ovem
ber 2
014,
cont
ains
115
issu
e typ
es ca
tego
rized
into
five
maj
or b
ranc
hes)
is n
ot in
tend
ed to
be u
sed
in it
s ent
irety
for a
ny p
artic
ular
eval
u-at
ion
task
, thi
s ove
rvie
w ch
art p
rese
nts a
“dec
ision
tree
” sui
tabl
e for
sele
ctin
g an
issue
type
from
it. I
n pr
actic
al te
rms,
how
ever
, an
indi
vidu
al m
etric
wou
ld h
ave a
smal
ler d
ecisi
on tr
ee th
at co
vers
just
the i
ssue
s con
tain
ed in
that
met
ric.
To u
se th
e dec
ision
tree
star
t with
the fi
rst q
uesti
on an
d fo
llow
the a
ppro
pria
te an
swer
s unt
il a s
peci
fic is
sue t
ype i
s rea
ched
.
Gen
eral
1
Is th
e is
sue
rela
ted
to a
diff
eren
ce in
m
eani
ng b
etw
een
the
sour
ce a
nd
targ
et?
Yes
Go
to A
ccur
acy
NoG
o to
Gen
eral
2
Gen
eral
2
Is th
e is
sue
rela
ted
to th
e lin
guis
tic
or m
echa
nica
l for
mul
atio
n of
the
cont
ent
Yes
Go
to F
luen
cyNo
Go
to G
ener
al 3
Gen
eral
3
Is th
e is
sue
rela
ted
to th
e ap
prop
riate
ness
of t
he c
onte
nt fo
r the
ta
rget
aud
ienc
e or
loca
le (s
epar
ate
from
w
heth
er it
is tr
ansl
ated
cor
rect
y)?
Yes
Go
to V
erit
yNo
Go
to G
ener
al 4
Gen
eral
4
Is th
e is
sue
rela
ted
to th
e pr
esen
tatio
nal/d
ispl
ay a
spec
ts o
f th
e co
nten
t?
Yes
Go
to D
esig
nNo
Go
to G
ener
al 5
Gen
eral
5
Is th
e is
sue
rela
ted
to w
heth
er o
r no
t the
con
tent
was
set
up
prop
erly
to
sup
port
sub
sequ
ent t
rans
latio
n/ad
apta
tion?
Yes
Go
to In
tern
atio
naliz
atio
nNo
Go
to G
ener
al 6
Gen
eral
6
Is th
e is
sue
addr
esse
d in
the
Com
pata
bilit
y br
anch
Yes
Go
to C
ompa
tabi
lity
NoO
ther
Acc
ura
cy
Not
e: If
the
issue
is fo
und
in th
e C
ompa
tabi
lity
bran
ch it
is p
roba
bly
not a
tran
slatio
n pr
oduc
t iss
ue,
but i
nste
ad a
pro
cess
issu
e an
d w
ould
nor
mal
ly n
ot
be a
ddre
ssed
in M
QM
. If t
he is
sue
is O
ther
, it m
ay
not b
e a
tran
slatio
n-re
late
d iss
ue si
nce
tran
slatio
n-re
late
d iss
ues w
ould
nor
mal
ly fa
ll in
to o
ne o
f the
m
ajor
bra
nche
s.
top related