Top Banner
This document is part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)”. This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 296347. Supplement 1 Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality Author(s): Aljoscha Burchardt and Arle Lommel (DFKI) Dissemination Level: Public Date: 19.11.2014 This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
19

Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Jan 20, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

This document is part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)”. This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 296347.

Supplement 1

Practical Guidelines for the Use of MQM in

Scientific Research on Translation Quality

Author(s): Aljoscha Burchardt and Arle Lommel (DFKI)

Dissemination Level: Public

Date: 19.11.2014

This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Page 2: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

2

Grant agreement no. 296347 Project acronym QTLaunchPad Project full title Preparation and Launch of a Large-scale Action for Quality Transla-

tion Technology Funding scheme Coordination and Support Action Coordinator Prof. Hans Uszkoreit (DFKI) Start date, duration 1 July 2012, 24 months Distribution Public Contractual date of delivery — Actual date of delivery 18.November 2014 Supplement number 1 Supplement title Practical Guidelines for the Use of MQM in Scientific Research on

Translation Quality Type Report Status and version Final, v1.0 Number of pages Contributing partners DFKI Authors Aljoscha Burchard, Arle Lommel EC project officer Aleksandra Wesolowska The partners in QTLaunchPad are:

Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany

Dublin City University (DCU), Ireland

Institute for Language and Speech Processing, R.C. “Athena” (ILSP/ATHENA RC), Greece

The University of Sheffield (USFD), United Kingdom

For copies of reports, updates on project activities and other QTLaunchPad-related information, con-tact: DFKI GmbH QTLaunchPad Dr. Aljoscha Burchardt [email protected] Alt-Moabit 91c Phone: +49 (30) 23895-1838 10559 Berlin, Germany Fax: +49 (30) 23895-1810 Copies of reports and other material can also be accessed via http://www.qt21.eu/launchpad © 2014, The Individual Authors

This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Page 3: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

3

Table of Contents

1   Executive Summary .............................................................................................................. 4  2   MQM Process ........................................................................................................................ 4  

2.1   Selecting a metric ............................................................................................................ 4  2.2   Selecting an Annotation Environment .......................................................................... 6  2.3   Selection of Annotators and Training ............................................................................ 6  2.4   Evaluation ....................................................................................................................... 7  2.5   Analysis ........................................................................................................................... 8  

3   Costs ...................................................................................................................................... 8  4   Amount of text required ....................................................................................................... 9  5   Training materials ................................................................................................................. 9  

5.1   Decision trees .................................................................................................................. 9  5.1.1   A generalized decision tree ..................................................................................... 11  

5.2   Annotation guidelines ................................................................................................... 11  

Page 4: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

4

1 Executive Summary This report provides practical guidelines for the use of the Multidimensional Quality Metrics (MQM) framework for assessing translation quality in scientific research projects. It does not address the use of MQM in production environments systematically, although notes are pro-vided concerning the use in these environments. It covers the process for using MQM, the costs, required amounts of text, training methods, and other relevant factors. MQM can provide detailed insights about translation issues/errors on different levels of granularity up to the word/phrase level as input for systematic approaches to overcome translation quality barriers. Like the common practice of post-editing, it requires manual work that will hopefully become less labor-intensive in the future through (partial) automa-tion.

2 MQM Process This section outlines the process for using MQM in a research scenario. It covers selection of a metric, training, the evaluation task itself, and analysis of results.

2.1 Selecting a metric The Multidimensional Quality Metrics (MQM) framework does not provide a translation quality metric, but rather provides a framework for defining task-specific translation metrics. Thus, rather than speaking of or using MQM itself for a specific quality evaluation task, one uses an MQM-compliant metric. To create an MQM-compliant metric, one must make a determination about which issues will be checked and to what level of granularity. At the coarsest level, it is possible to have an MQM-compliant metric that identifies as few as two error types: Accuracy and Fluency. (If only the target text is evaluated, it is even possible to have a single-issue metric with Fluency alone, but this metric could not be said to assess translation quality in any meaningful sense.) Generally, however, additional detail would be desirable and a more detailed metric would be needed. For example, the issue type hierarchy of the metric used for annotating corpus data in the QTLaunchPad project’s shared task can be graphically represented as shown in Figure 1. This particular metric was designed to provide analytic insight into the problems encountered in high-quality MT. With 19 issue types, it is considerably more gran-ular than would be used in many production evaluation environments, but the detail was needed to support the QTLaunchPad evaluation tasks. Note that it extends the MQM issue set by adding three custom subtypes to Function words: Extraneous, Incorrect, and Missing. These issues provide additional insight into one aspect of translation the proved to be partic-ularly difficult for MT.

Page 5: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

5

Figure 1. MQM-compliant error hierarchy for diagnostic MT evaluation

This metric would not be suitable for all cases, and is presented here as an example. In gen-eral, an MQM-compliant metric designed for a research task should have the following quali-ties:

• It should be granular enough to address the relevant research questions. For example, a simple Accuracy-Fluency metric that emulates traditional Adequacy/Fluency evalua-tions in MT research would provide no insight into the specific nature of issues within those categories. Therefore, the metric selected should be certain to cover the research agenda. (In the case of QTLaunchPad, the research agenda was broad and focused on discovery of patterns, so the metric is fairly complex.)

• The metric should not contain extraneous categories or ask annotators to mark issues irrelevant to the research question. For example, it does not make sense to use Termi-nology in addition to general Mistranslation when working on news data where no de-fined terminology exists. Adding categories can increase “noise” in the data and also raises costs of annotation. However, if there are “borderline” categories that may be relevant, they should be included since retroactively adding them in would generally not be possible.

• The metric should be small enough to be maintained in the memory of the annotator. General psychometric guidelines suggest that categorizations used in evaluation should target six to seven items. For detailed evaluation such a small set may not be possible (the 19 categories of the MQM shared task are probably pushing the outer limit of what it is cognitively possible for annotators to keep in mind).

• Annotators must be given heuristics for selection of issues in ambiguous cases. (Ways to provide this guidance are covered in Section 5 (Training materials) below.

For translation production evaluation the QTLaunchPad website’s section on MQM1 contains useful information on creating relevant metrics based on project specifications. Research projects, by contrast, typically will have a clearer set of requirements (those needed to an-swer the research question at hand), but will also often be more complex than is recom-mended for production evaluation. After selecting the MQM issue types to be used in the evaluation task, an appropriate annota-tion environment needs to be configured to support the issue type selection. Both translate5 and the MQM scorecard are configured using a simple XML file (see Figure 2) that identifies the issues to be used. Other environments that could be configured to use MQM categories may use other mechanisms to declare a metric.

1 http://www.qt21.eu/launchpad/content/multidimensional-quality-metrics

Page 6: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

6

Figure 2. XML MQM metric definition file for use in translate5 and the MQM scorecard.

2.2 Selecting an Annotation Environment There are a number of types of annotation environments:

• At the coarsest level are questionnaires, spreadsheets and simple score card tools that simply count errors (but do not indicate their location within text) or evaluate texts as a whole. These tools are useful for looking at features of the text as a whole, but do not provide detailed insight into specific errors. Such systems are generally not advisable for translation research tasks that involve error analysis (but they may be suitable in some production environments or for research projects where finer granularity is not needed).

• At a finer level of granularity are scorecard systems that store annotations at the seg-ment level. They allow users to attach errors to specific segments, but not to specific words. They may support adding notes or highlighting text. These systems are typically easy to use but do not tie issues to specific locations. These systems are useful for quick annotation where it is sufficient to know which segments have which problems. The MQM Scorecard tool provides this functionality.

• Span-level annotation tools provide the ability to tie errors to particular spans in the text. Using them requires more training and care than is needed for the other tools since issues have to be associated with spans of text. These tools provide the greatest insight into errors. The translate5 tool used for most QTLaunchpad tasks is this sort of tool.

The environment selected must support the analysis intended for the annotated data. In general, it is wise to err on the side of caution and ask for more detail rather than less. After selecting the annotation environment it must be configured with the text(s) to be anno-tated and the appropriate metric definition.

2.3 Selection of Annotators and Training Annotation is an intellectually demanding task. Three typical layers of annotation in MT de-velopment are:

<issues> <issue type="Accuracy" level="0" display="yes"> <issue type="Mistranslation" level="1" display="yes"> <issue type="Terminology" level="2" display="yes" /> </issue> <issue type="Omission" level="1" display="yes" /> <issue type="Addition" level="1" display="yes" /> <issue type="Untranslated" level="1" display="yes" /> </issue> <issue type="Fluency" level="0" display="yes"> <issue type="Content" level="1" display="no"> <issue type="Register" level="2" display="yes" /> </issue> <issue type="Mechanical" level="1" display="no"> <issue type="Spelling" level="1" display="yes" /> <issue type="Typography" level="1" display="yes" /> <issue type="Grammar" level="1" display="yes" /> </issue> <issue type="Unintelligible" level="1" display="yes" /> </issue> </issues>

Page 7: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

7

1. The phenomenological level (target errors/issues) 2. The linguistic level (source or target POS, phrases, etc.) 3. The explanatory level (source/system-related causes for certain errors)

MQM annotation is targeting the phenomenological level. Depending on the complexity of the metric, it may require expert-level skill in both translation theory and linguistics. Within the QTLaunchPad project, it was found that expert human translators represented ideal an-notators. However, not all translators were equally capable. In general, those with formal training in linguistics or with previous experience in error annotation (e.g., using a company-specific error scorecard system) were the most prepared for MQM annotation. As inter-annotator agreement (IAA) did not exceed 50%, even with training, it is important in research environments to have multiple annotators in order to control for variability be-tween individuals. Based on experience in the QTLaunchPad project, it is recommended that three annotators be used, if possible. It is anticipated that IAA would increase with experi-ence and feedback, but in most research scenarios it is unlikely that annotators will work with MQM for an extended period. Training is vital since the task and specific details of how to work with MQM-compliant met-rics and tools are not immediately apparent, even to highly skilled individuals. In general, the following training steps and materials are required:

• A live demo of the annotation environment. This step is vital to ensure that an-notators understand how to use the tool and are aware of all relevant features. Since annotation tools can be relatively complex, this demo should focus on a step-by-step explanation of the relevant process. It is recommended that the demo be recorded, if possible, for future reference.

• A decision tree and written annotation guidelines. A decision tree provides a relatively objective tool that helps guide the annotator to selection of the right issue. Written guidelines help annotators determine correct behavior in cases where the ap-propriate action is not self-evident (e.g., which portion of a text to mark when word or-der is wrong and multiple portions could be moved to fix the problem). These tools are discussed in Section 5 (Training materials) below.

• A calibration set. In this phase annotators are asked to work with a set where the er-ror properties are well known to the researchers. The data in such a set could be “real” data or could be data with known errors introduced into it. Comparing the annotators’ results for the calibration set with the idea profile allows the researcher to identify any problems or confusions with the evaluation and provide corrective guidance before the research data is considered. Note that the calibration set should be representative of the data to be annotated, and it is highly recommended that it in fact be drawn from the same data set as the data to be annotated. (E.g., if 1000 segments out of 1500 are to be evaluated, 150 might be set aside for calibration with the 1000 used for the research question then taken from the remaining 1350.)

2.4 Evaluation The evaluation/annotation task may proceed after training is completed and the results of the calibration set are verified. Based on experience, it is recommended that the annotators work in short segments (perhaps 30 minutes) with frequent breaks. The amount that can be evaluated in a given time frame depends on the number of errors present in the text: “clean-er” texts are faster to evaluate and annotate than are “dirty” texts with many errors. For MT evaluation, there is often a significant portion of the text that has so many errors that annotation is counter-productive since the nature of the errors may not be clear or the entire

Page 8: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

8

text may be unintelligible. Therefore it is recommended that the annotators conduct an ini-tial “triage” phase in which segments are quickly categorized into one of three categories:

• perfect segments (which do not need to be annotated), • segments to be annotated (the QTLaunchPad project targeted segments with 1–3 er-

rors), and • “garbage” segments which contain too many errors to be annotated.

Annotation can then focus on the second category without worrying about the other two cat-egories. If the triage task is conducted by more than one individual, appropriate policies for reconciling differences of opinion should be established (e.g., if one annotator marks a sen-tence as perfect and another as needing annotation, it is probably wise to circulate it for an-notation by all annotators).

2.5 Analysis Multiple types of analysis are possible. Aggregate figures are often useful if multiple MT sys-tems are being compared as they can reveal system-level differences across engines. For de-termining the causes of specific errors, detailed analysis of specific issues is required. What-ever analysis is intended, it is important that data be preserved at all stages of transfor-mation (e.g., if errors are extracted, the process should make a copy of the original data) since it is easy to make mistakes that can result in irretrievable data loss.

3 Costs Based on the QTLaunchPad tasks, which focused on “near miss” translations, the direct costs of annotation, including triage selection of data to annotate, were approximately €1.50/segment.2 With previously trained annotators, the amount would probably drop to €1.00–1.25/segment. However, costs for MQM-based analysis are highly variable. For text with few errors, annotation would be quite inexpensive. For text with many errors, annota-tion would be much more expensive. This variability is one of the reasons why a triage phase is strongly recommended since it allows the researcher to select segments with relatively predictable costs. The cost per issue in the QTLaunchPad tasks was approximately 0.75€. Since the number of issues will vary between tasks, cost per issue cannot predict costs, but gives an idea of the productivity of evaluators. Finally, from the QTLaunchPad tasks the cost per word of annotation comes out to around €0.07–0.09/word. Accordingly, for pre-selected items with relatively few errors, the cost per thousand words would be around €7–9. If multiple annotations are factored in, the costs are multiplied by the number of annotators. In order to obtain sound data then, the best estimate at present is that the cost is between €20 and €30 for 100o words (assuming triple annotation). These figures do not include management or analysis, which can easily add 100–200% on top of the direct costs.

2 These figures are based on a payment system that paid a flat fee for a certain amount of text. An hourly fee was not used in QTLaunchPad because there was no previous experience on which to estimate time. However, if a fee of €50/hour is used with trained annotators, the figures presented here would be 15–20% lower for the sorts of text evaluated in the project.

Page 9: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

9

4 Amount of text required There is no firm guidance for the amount of material needed for annotation. Based on QTLaunchPad results, it is possible to detect trends and identify major issue types with as few as 100–150 segments. Identifying rarer phenomena would require more data since in-teresting phenomena would be expected to display a “long-tail” distribution, with certain kinds of errors (and causes) accounting for the bulk of problems, while other errors are less common. If the goal is just to identify high-level distribution, small data sets may suffice, but in the QTLaunchPad project, a concerted effort was made to identify the causes of problems, a task which required many more segments. As is typical, the more data one has the better.

5 Training materials The most useful training materials are annotation guidelines and decision trees. A set of an-notation guidelines and a decision tree initially developed in QTLaunchPad and updated for use in the QTLeap project are included at the end of this document. The following subsec-tions describe these resources and how to create them.

5.1 Decision trees Decision trees are useful tools for learning a specific MQM metric’s issue types and distin-guishing between them. They are especially useful as a learning tool and to aid in determin-ing which issue applies in cases where the answer is not immediately apparent. There are at least as many possible decision trees as there are MQM metrics (more, in fact, because deci-sion trees can present issues in multiple orders). This document provides some guidance for making decision trees. Decision trees should work through branches of the hierarchy, with a single question sepa-rating each branch from other branches. This requirement is important because all children of a particular issue type (an issue and its children constitute a branch) could be classified as the parent type, so a single question is needed that can distinguish all of them as a group from other issues. After determining which node an issue is contained within, it is important to resolve more specific issue types before more general ones. This guideline works on the principle of exclu-sion: by eliminating specific cases the general case is what remains. For example, if an MQM metric has the following structure for Accuracy:

• Accuracy • Mistranslation • Terminology • Company terminology

• Number • Omission • Omitted variable

The process to work through the hierarchy is as follows:

Page 10: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

10

• Determine whether the issue is a type of Mistranslation or, if it is not, if it is a type of Omission. Since these are the specific types of Accuracy, they need to be eliminated before declaring the issue a general Accuracy issue.

• If it is one of the subtypes, questions must determine if the issue is one of their children (or grandchildren). For example, if an issue is a type of Mistranslation, the question “Was a number mistranslated?” would identify (or rule out) a mistranslation of a num-ber; the question “Is a term translated incorrectly?” would identify (or rule out) Ter-minology. If the answer to both of those questions is “No” then the issue is Mis-translation.

• A similar principle would ask the evaluator to rule out Company terminology before using a general Terminology issue type.

In accordance with the above, a decision tree for the Accuracy branch in this metric might look like the following:

• Is content present in the source inappropriately omitted from the target? [This ques-tion selects or excludes Omission] • Yes: Go to question 2 [We know it is a type of Omission] • No: Go to 3. [Omission has been excluded, so now we need to see if is another type of

Accuracy]

• Is a variable omitted from the target content? [Tells us if the specific subtype of Omission should be selected] • Yes: Omitted variable • No: Omission [We have excluded the subtype of Omission, leaving the general

type] • Are words or phrases translated incorrectly (i.e., is meaning conveyed by the source

changed in the target)? [This question selects or excludes Mistranslation] • Yes: Go to question 4 [The issue is a type of Mistranslation] • No: Accuracy [Both Omission and Mistranslation have now been excluded,

leaving only Accuracy] • Were numbers translated incorrectly? [Selects or excludes Number]

• Yes: Number • No: Go to question 5 [Number is excluded, so we move on]

• Is a domain- or organization-specific word or phrase translated incorrectly? [Selects or excludes Mistranslation] • Yes: Go to question 6 [We know it is a type of Terminology] • No: Mistranslation [We have excluded every other option]

• Is the word or phrase translated contrary to company-specific terminology guidelines? [Identifies or excludes Company terminology]

• Yes: Company terminology • No: Terminology [Company terminology has been excluded, leaving the more gen-

eral Terminology] Note that the order in which children of an element are selected is theoretically unimportant. For example, question 1 above could have served to select or exclude Terminology and question 2 could have focused on Omission. The important aspect is that subtypes are ex-cluded before selecting a general type. Although there is no theoretical principle for placing one issue type before another, however, there may be practical reasons to do so: decision trees should be optimized for efficiency. If it is expected that one issue type will be quite rare while its sibling would be more common, the more common sibling should be placed first to make it easier to find.

Page 11: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use of MQM in Scientific Research on Translation Quality

11

5.1.1 A generalized decision tree

The attached decision tree covers the full MQM hierarchy. It is not expected that the entire tree will be used, but individual questions can be taken from this decision tree to build spe-cific trees. (Note that, due to its complexity, this tree is optimally printed on A0 paper. An A4-sized version is included in this document for reference. The full-sized version is availa-ble at http://qt21.eu/downloads/fullDecisionTreeComplete.pdf.) Note that portions of the tree generally should not be used without their parent issue unless the decision tree is intended to document only specific errors and not general types. For ex-ample, selecting Company Terminology without its ancestor nodes Terminology, Mis-translation, and Accuracy would result in a tree that cannot identify more general error types. This approach might be appropriate if the only issue being assessed is adherence to company terminology guidelines. If a metric is created that identifies only specific subtypes (e.g., there is a metric that counts only terminology violations and distinguishes between company and normative terminology), a decision tree is still possible, but could not be made from this resource without modification. To extract a portion of the tree that is less granular than the full tree, it is necessary to re-move any unneeded children of the types to be assessed. Guidance to remove these issues is beyond the scope of this description, but is relatively straight forward if the MQM hierarchy is understood. Note that the specific questions may vary from those presented in the decision tree as long as they are capable of identifying the appropriate issues. The specific questions presented here are not to be treated as normative.

5.2 Annotation guidelines Annotation guidelines provide practical guidance for the annotator. They need to provide a definition of the metric, instructions for how to realize the metric in the chosen annotation environment, and any specific items that need special attention. The guidelines are used in training, but also for reference during annotation. Therefore they need to be short and acces-sible. It may be advisable to maintain the guidelines in an accessible format where changes can be made to address queries and concerns that arise during annotation. A sample set of annotation guidelines is included at the end of this document. (The provided guidelines were given to annotators working on the MQM corpora analyzed in D1.3.1.)

Page 12: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Guide to selecting MQM issues for the MT Evaluation Metric

version 1.4 (2014 November 17)

Selecting issues can be a complex task. In order to assist evaluators, a decision tree helps evaluators select appropri-ate issues. Use the decision tree not only for learning about MQM issues, but to guide your annotation efforts and resolve any questions or concerns you may have.

Start at the upper left corner of the decision tree and then answer the questions and follow the arrows to find appropriate issues.

If using translate5, note that the decision tree is organized a bit differently than the hierarchy in translate5 because it eliminates specific issue types before moving to general ones, so you familiarize yourself with how issues are organized in translate5 before beginning annotation.

Add notes in translate5 of the scorecard to explain any decisions that you feel need clarification, to ask ques-tions, or to provide information needed to understand issues, such as notes about what has been omitted in a translation.

In addition to using the decision tree, please understand and follow the guidelines in this document. Email us at [email protected] if you have questions that the decision tree and other content in this document do not address.

1. What is an error?An error represents any issue you may find with the translated text that either does not correspond to the source or is considered incorrect in the target language. The list of language issues upon which you are to base your annota-tion is described in detail below and provides a range of examples.

The list is divided into two main issue categories, Accuracy and Fluency, each of which contains relevant, more detailed subcategories. Whenever possible, the correct subcategory should be chosen; however, if in doubt, please do not guess. Instead, select the category level about which you are most certain in order to avoid inconsis-tencies in the results.

Example: The German term Zoomfaktor was incorrectly translated as zoom shot factor, and you are unsure whether this represents a Mistranslation or an Addition. In this case, cat-egorize the error as an Accuracy error since it is unclear whether content has been added or a term mistranslated.

2. The Annotation ProcessThe translations you annotate should be a set of “near miss” (i.e., “almost perfect”) translations to annotate. Please follow these rules when selecting errors and tagging the respective text in the translations:

1. Use the examples in this documentation to understand specific classes.

2. If multiple types could be used to describe an issue (e.g., Agreement, Word form, Grammar, and Fluency), select the first one that the decision tree guides you to. The tree is organized along the following principles:

a. It prefers more specific types (e.g., Part of speech) to general ones (e.g., Grammar). However, if a specific type does not apply, it guides you to use the general type.

b. General types are used where the problem is of a general nature or where the specific problem does not have a precise type. For example He slept the baby exhibits what is technically known as a valency error, but because there is no specific type for this error available, it is assigned to Grammar.

3. Less is more. Only tag the relevant text. For example, if a single word is wrong in a phrase, tag only the single word rather than the entire phrase. If two words, separated by other words, constitute an error, mark only those two words separately. (See the section on “minimal markup” below.)

4. If correcting one error would take care of others, tag only that error. For example, if fixing an Agreement er-ror would fix other related issues that derive from it, tag only the Agreement error, not the errors that result from it.

Page 13: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 2

Does

an un

need

ed

func

tion w

ord

appe

ar?

Is a n

eede

d fun

ction

wo

rd m

issing

?

Are “

func

tion

word

s” (p

repo

si-tio

ns, a

rticle

s, “h

elper

” ver

bs, e

tc.)

incor

rect?

Is an

inco

rrect

func

tion

word

used

?

Is th

e tex

t gar

bled o

r ot

herw

ise im

possi

ble to

un

derst

and?No

Flue

ncy

(gen

eral)

*Gr

amm

ar(g

ener

al)

Func

tion

wor

ds(g

ener

al)

Yes

Extr

aneo

usM

issi

ngIn

corr

ect

Unin

telli

gibl

e

No No No

Is th

e tex

t gra

mm

atica

lly

incor

rect? No No

Yes

NoNo

No

Accu

racy

(gen

eral)

*

NoNo

NoNo

Are w

ords

or ph

rase

s tra

nslat

ed in

appr

opri-

ately

?

Mis

tran

slat

ion

Yes

Are t

erm

s tra

nslat

ed

incor

rectl

y for

the d

o-m

ain or

cont

rary

to an

y te

rmino

logy r

esou

rces?

Term

inol

ogy

Yes

Is th

ere t

ext i

n the

so

urce

lang

uage

that

sh

ould

have

been

tra

nslat

ed?

Untr

ansl

ated

Yes

Is so

urce

cont

ent

inapp

ropr

iately

omitt

ed

from

the t

arge

t?

Omis

sionYe

s

Yes

Yes

Yes

Yes

Has u

nnee

ded c

onte

nt

been

adde

d to t

he

targ

et te

xt?

Addi

tion

Is ty

pogr

aphy

, oth

er th

an

miss

pellin

g or c

apita

liza-

tion,

used

inco

rrectl

y?

Are o

ne or

mor

e wor

ds

miss

pelle

d/ca

pitali

zed

incor

rectl

y?

Typo

grap

hy

Spel

ling

No No

Yes

Yes

Yes

Do w

ords

appe

ar in

th

e wro

ng or

der?

Wor

d or

der

Yes

Yes

Accu

racy

Flue

ncy

Gra

mm

ar

Is th

e wro

ng fo

rm

of a

word

used

?Is

the p

art o

f spe

ech

incor

rect?

Do tw

o or m

ore w

ords

no

t agr

ee fo

r per

son,

nu

mbe

r, or g

ende

r?

Is a w

rong

verb

form

or

tens

e use

d?W

ord

form

(gen

eral)

Part

of s

peec

h

Yes

NoNo

No

Yes

Tens

e/m

ood/

aspe

ct

Yes

Agre

emen

t

Yes

Wor

d fo

rm

Func

tion

wor

ds

Note:

For a

ny qu

estio

n, if t

he an

swer

is un

clear,

selec

t “No

Is th

e issu

e rela

ted t

o the

fact

that

the t

ext i

s a tr

ansla

tion

(e.g.

, the

targ

et te

xt do

es no

t m

ean w

hat t

he so

urce

text

do

es)?

No

* Ple

ase

desc

ribe

any

Flue

ncy

(gen

eral

) or A

ccur

acy

(gen

eral

) iss

ues u

sing

the

Not

es fe

atur

e.

MQ

M A

nn

ota

tio

n D

ecis

ion

Tr

ee

Page 14: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 3

ExamplesSource: Importfilter werden geladenTranslation: Import filter are being loadedCorrect: Import filters are being loaded

In this example, the only error is the translation of filter in the singular rather than the plural (as made clear by the verb form in the source text). This case should be classified as Mistranslation, even though it shows prob-lems with agreement: if the subject had been translated properly the agreement problem would be resolved. In this case only filter should be tagged as a Mistranslation.

Source: im Dialog ExportierenTranslation: in the dialog exportCorrect: in the Export dialog

In this example, only Mistranslation should be marked. While Word order and Spelling (capitalization) would be considered errors in other contexts, this would not be the case here, as these two words constitute one term that has been incorrectly translated.

5. If one word contains two errors (e.g., it has a Spelling issue and is also an Extraneous function word), enter both errors separately and mark the respective word in both cases.

6. If in doubt, choose a more general category. The categories Accuracy and Fluency can be used if the nature of an error is unclear. In such cases, providing notes to explain the problem will assist the QTLaunchPad team in its research.

3. Tricky casesThe following examples are ones that have been encountered in practice and that we wish to clarify.

• Function words: In some cases issues related to function words break the accuracy/fluency division seen in the decision tree because they are listed under Fluency even though they may impact meaning. Despite this issue, please categorize them as the appropriate class under Function words.

Example: The ejector may be found with the external case (should be on in this case). Even though this error changes the meaning, it should be classified as Function words: incorrect in the Fluency branch.

• Word order: Word order problems often affect long spans of text. When encountering word orders, mark the smallest possible portion that could be moved to correct the problem.

Example: He has the man with the telescope seen. Here only seen should be marked as moving this one word would fix the problem.

• Hyphenation: Hyphenation issues sometimes occur in untranslated content and should be classified as such. Otherwise they should be classified as Spelling.

Example: Load the XML-files (Spelling) Nützen Sie die macro-lens (Untranslated, if the source has macro-lens as well)

• Number (plural vs. singular) is a Mistranslation.

• Terminology: Inappropriate use of terms as distinct from general-language Mistranslation.

Example: An English translation uses the term thumb drive to translate the German USB Speicherkarte. This translation is intelligible, but if the translation mandated in specifi-cations or a relevant termbase is USB memory stick, the use of thumb drive constitutes a Terminology error, even if thumb drive would be acceptable in everyday usage. How-ever, if USB Speicherkarte were to be translated as USB Menu, this would be a Mistrans-

Page 15: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 4

lation since the words would be translated incorrectly, regardless of whether the origi-nal phrase is a term.

NOTE: Because no terminology list is provided, please use your understanding of relevant IT terminology for the evaluation task.

• Unintelligible: Use Unintelligible if content cannot be understood and the reason cannot be analyzed according to the decision tree. This category is used as a last resort for text where the nature of the problem is not clear at all.

Example: In the sentence “You can also you can use this tab to precision, with the colours are described as well as the PostScript Level,” there are enough errors that the meaning is unclear and the precise nature of the errors that lead to its unintelligibility cannot be easily determined.

• Agreement: This category generally refers to agreement between subject and predicate or gender and case.

Examples: The boy was playing with her own train I is at work

• Untranslated: Many words may look as if they have been translated and simply forgotten to apply proper capitalization or hyphenations rules. In most, cases, this would represent an untranslated term and not a Spelling. If the target word or phrase is identical to the source word or phrase, it should be treated as Untranslated, even if a Spelling error could also account for the problem.

4. Minimal markupIt is vital in creating error markup that errors be marked up with the shortest possible spans. Markup must identify only that area needed to specify the problem. In some cases this requirement means that two separate spans must be identified.

The following examples help clarify the general principles:

Incorrect markup Problem Correct minimal markupDouble click on the number faded in the status bar.[Mistranslation]

Only the single word faded is prob-lematic, but the markup indicates that number faded in is incorrect.

Double click on the number faded in the status bar.

The standard font size for dialogs is 12pt, which corresponds to a stan-dard of 100%. [Mistranslation]

Only the term Maßstab has been translated incorrectly. The larger span indicates that text that is perfectly fine has a problem.

The standard font size for dialogs is 12pt, which corre-sponds to a standard of 100%.

The in 1938 nascent leader with flair divined %temp_name eating lonely. [Unintelligible]

The entire sentence is Unintelligible and should be marked as such.

The in 1938 nascent leader with flair divined %temp_name eating lonely.

As noted above, Word order can be problematic because it is often unclear what portion(s) of the text should be marked. In cases of word order, mark the shortest portion of text (in number of words) that could be moved to fix the problem. If two portions of the text could resolve the problem and are equal in length, mark the one that occurs first in the text. The following examples provide guidance:

Incorrect markup Problem Correct minimal markupThe telescope big observed the op-eration

Moving the word telescope would solve the problem and only this word should be marked (since it occurs first in the text).

The telescope big observed the operation

Page 16: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 5

The eruption by many instruments was recorded.

Although this entire portion shows word order problems, moving was recorded would resolve the problem (and is the shortest span that would resolve the problem).

The eruption by many instru-ments was recorded.

The given policy in the manual user states that this action voids the warranty.

This example actually has two separate issues that should be marked separately.

The given policy in the manual user states that this action voids the warranty.

Agreement poses special challenges because portions that disagree may be widely separated. To select appropriate minimal spans, consider the following guidelines:

• If two items disagree and it is readily apparent which should be fixed, mark only the portion that needs to be fixed. E.g., in “The man and its companion were business partners” it is readily apparent that its should be his and the wrong grammatical gender has been used, so only its should be marked.

• If two items disagree and it is not clear which portion is incorrect, mark the both items and mark them for Agreement, as shown in the example in the table below.

The following examples demonstrate how to mark Agreement:

Incorrect markup Problem Correct minimal markupThe man and its companion were business partners. [Agreement]

In this example, it is clear that its is the problematic portion, and that man is correct, so only its should be marked.

The man and its companion were business partners.

The man whom they saw on Friday night at the store were very big. [Agreement]

In this example it is not clear whether man or were is the error since there is nothing to indicate whether singular or plural is intended. Here the highlighted portion identifies only a single word, insufficient to identify the agreement problem. The correct version highlights both words as separate issues. In such cases use the Notes field to explain the decision.

The man whom they saw on Friday night at the store were very big. [Agreement]

In the event of questions about the scope of markup that should be used, utilize the Notes field to make a query or explain your choice.

Page 17: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 6

A. Issue categories

The error corpus uses the following issue categories:

• Accuracy. Accuracy addresses the extent to which the target text accurately renders the meaning of the source text. For example, if a translated text tells the user to push a button when the source tell the user not to push it, there is an accuracy issue.

• Mistranslation. The target content does not accurately represent the source content.

Example: A source text states that a medicine should not be administered in doses great-er than 200 mg, but the translation states that it should not be administered in doses less than 200 mg.

Note(s): Mistranslation can be used for both words and phrases.

• Terminology. Domain- or industry-specific terms (including multi-word terms) are trans-lated incorrectly.

Example: In a musicological text the term dog is encountered and translated into German as Hund rather than the domain-specific term Schnarre.

Note(s): Terminology errors may be valid translations for the source word in gen-eral language, but are incorrect for the specific domain or organization.

• Omission. Content is missing from the translation that is present in the source.

Example: A source text refers to a “mouse pointer” but the translation does not mention it.Note(s): Omission should be reserved for those cases where content present in the

source and essential to its meaning is not found in the target text.

• Addition. The target text includes text not present in the source.

Example: A translation includes portions of another translation that were inadvertently pasted into the document.

• Untranslated. Content that should have been translated has been left untranslated.

Example: A sentence in a Japanese document translated into English is left in Japanese.

Note(s): As noted above, if a term is passed through untranslated, it should be classified as Untranslated rather than as Mistranslation.

• Fluency. Fluency relates to the monolingual qualities of the source or target text, relative to agreed-upon specifications, but independent of relationship between source and target. In other words, fluency issues can be assessed without regard to whether the text is a translation or not. For example, a spelling error or a problem with register remain issues regardless of whether the text is translated or not.

• Spelling. Issues related to spelling of words (including capitalization)

Examples: The German word Zustellung is spelled Zustetlugn. The name John Smith is written as “john smith”.

• Typography. Issues related to the mechanical presentation of text. This category should be used for any typographical errors other than spelling.

Examples: Extra, unneeded carriage returns are present in a text. A semicolon is used in place of a comma.

• Grammar. Issues related to the grammar or syntax of the text, other than spelling and orthography.

Example: An English text reads “The man was in seeing the his wife.”Note(s): Use Grammar only if no subtype accurately describes the issue.

Page 18: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

MQM annotators guidelines (version 1.4, 2014-11-17) Page 7

• Word form. The wrong form of a word is used. Subtypes should be used when possible.

Example: An English text has comed instead of came.

• Part of speech. A word is the wrong part of speech

Example: A text reads “Read these instructions careful” instead of “Read these instructions carefully.”

• Agreement. Two or more words do not agree with respect to case, number, person, or other grammatical features

Example: A text reads “They was expecting a report.”

• Tense/aspect/mood. A verbal form inappropriate for the context is used

Example: An English text reads “Yesterday he sees his friend” instead of “Yes-terday he saw his friend”; an English text reads “The button must be pressing” instead of “The button must be pressed”.

• Word order. The word order is incorrect

Example: A German text reads “Er hat gesehen den Mann” instead of “Er hat den Mann gesehen.”

• Function words. Linguistic function words such as prepositions, particles, and pronouns are used incorrectly

Example: An English text reads “He beat him around” instead of “he beat him up.”

Note(s): Function words is used for cases where individual words with a gram-matical function are used incorrectly. The most common problems will have to do with prepositions, and particles. For languages where verbal prefixes play a significant role in meaning (as in German), they should be included here, even if they are not independent words.

There are three subtypes of Function words. These are used to indicate whether an unneeded function word is present (Extraneous), a needed function word is missing (Missing), or a incorrect function word is used (Incorrect). Evaluators should use the note field to specify details for missing function words.

• Unintelligible. The exact nature of the error cannot be determined. Indicates a major break down in fluency.

Example: The following text appears in an English translation of a German automotive manual: “The brake from whe this કુતારો િસ S149235 part numbr,,."

Note(s): Use this category sparingly for cases where further analysis is too uncertain to be useful. If an issue is categorized as Unintelligible no further categorization is required. Unintelligible can refer to texts where a significant number of issues combine to create a text for which no further determination of error type can be made or where the relationship of target to source is entirely unclear.

Page 19: Quality Translation 21 - Supplement 1 Practical Guidelines for ...Preparation and Launch of a Large-scale Action for Quality Translation Technology Practical Guidelines for the Use

Star

t h

ere

Ver

ity

Inte

rnat

ion

-al

izat

ion

Des

ign

Flu

ency

Subt

ypes

of I

nter

-na

tiona

lizat

ion

are

curr

ently

und

efine

d.

Co

mp

atab

ilit

y (d

epre

cate

d) –

Thes

e issu

es ar

e inc

luded

prim

arily

for c

ompa

tabil

ity w

ith th

e LISA

QA M

odel

Appl

icat

ion

com

patib

ility

• B

ill o

f mat

eria

ls/ru

nlist

• Bo

ok-b

uild

ing

sequ

ence

• C

over

s • D

eadl

ine

• Del

iver

y • D

oes n

ot a

dher

e to

spec

ifica

tions

• Em

bedd

ed te

xt •

File

form

at •

Func

tiona

l • O

utpu

t dev

ice

• Prin

ting

• Rel

ease

gui

de •

Spin

es •

Styl

e, pu

blish

ing

stan

dard

s • T

erm

inol

ogy,

cont

extu

ally

inap

prop

riate

To L

earn

Mor

e:

A1

Has

con

tent

pre

sent

in th

e so

urce

be

en in

appr

opria

tely

om

itted

from

th

e ta

rget

?

Yes

Go

to A

2No

Go

to A

3

A2

Is a

var

iabl

e om

itted

from

the

targ

et

cont

ent?

Yes

Omit

ted

varia

ble

NoOm

issi

on

A3

Has

con

tent

not

pre

sent

in th

e so

urce

bee

n in

appr

opria

tely

add

ed

to th

e so

urce

?

Yes

Addi

tion

NoG

o to

A4

A4

Has

con

tent

bee

n le

ft in

the

sour

ce

lang

uage

that

sho

uld

have

bee

n tr

ansl

ated

?

Yes

Go

to A

5No

Go

to A

6

A5

Is th

e un

tran

slat

ed c

onte

nt in

a

grap

hic?

Yes

Untr

ansl

ated

gra

phic

NoUn

tran

slat

ed

A6

Are

wor

ds o

r phr

ases

tran

slat

ed

inco

rrec

tly?

Yes

Go

to A

7No

Accu

racy

(gen

eral)

A7

Is a

dom

ain-

or o

rgan

izat

ion-

spec

ific

wor

d or

phr

ase

tran

slat

ed

inco

rrec

tly?

Yes

Go

to A

8No

Go

to A

10

A8

Is th

e w

ord

or p

hras

e tr

ansl

ated

co

ntra

ry to

com

pany

-spe

cific

te

rmin

olog

y gu

idel

ines

?

Yes

Com

pany

term

inol

ogy

NoG

o to

A9

A9

Is th

e w

ord

or p

hras

e tr

ansl

ated

co

ntra

ry to

gui

delin

es e

stab

lishe

d in

a

norm

ativ

e do

cum

ent (

e.g.

, law

or

stan

dard

)?

Yes

Norm

ativ

e te

rmin

olog

yNo

Term

inol

ogy

A10

Is th

e tr

ansl

atio

n ov

erly

lite

ral?

Yes

Over

ly li

tera

lNo

Go

to A

11

A11

Is th

e tr

ansl

ated

con

tent

a “f

alse

fr

iend

” (fa

ux a

mi)?

Yes

Fals

e fr

iend

NoG

o to

A12

A12

Is a

nam

ed e

ntity

(suc

h as

the

nam

e of

a p

erso

n, p

lace

, or o

rgan

izat

ion)

tr

ansl

ated

inco

rrec

tly?

Yes

Enti

tyNo

Go

to A

13

A13

Was

con

tent

tran

slat

ed th

at s

houl

d no

t hav

e be

en tr

ansl

ated

?

Yes

Shou

ld n

ot h

ave

been

tran

slat

edNo

A14

A14

Was

a d

ate

or ti

me

tran

slat

ed

inco

rrec

tly?

Yes

Date

/tim

eNo

A15

A15

Wer

e un

its (e

.g.,

for m

easu

rem

ent o

r cu

rren

cy) t

rans

late

d in

corr

ectly

?

Yes

Unit

conv

ersi

onNo

A16

A16

Wer

e nu

mbe

rs tr

ansl

ated

in

corr

ectly

?

Yes

Num

ber

NoA

17

A17

Is th

e tr

ansl

atio

n in

impr

oper

exa

ct

mat

ch fr

om tr

ansl

atio

n m

emor

y?

Yes

Impr

oper

exa

ct m

atch

NoM

istr

ansl

atio

n

F1

Is th

e co

nten

t writ

ten

at a

leve

l of

form

ality

inap

prop

riate

for t

he

subj

ect m

atte

r, au

dien

ce, o

r tex

t ty

pe?

Yes

Go

to F

2No

Go

to F

3

F2

Doe

s th

e co

nten

t use

sla

ng o

r oth

er

unsu

itabl

e w

ord

varia

nts?

Yes

Varia

nts/

slan

gNo

Regi

ster

F3

Is th

e co

nten

t sty

listic

ally

in

appr

opria

te?

Yes

Styl

isti

csNo

Go

to F

4

F4

Is th

e co

nten

t inc

onsi

sten

t with

its

elf?

Yes

Go

to F

5No

Go

to F

10

F5

Are

abb

revi

atio

ns u

sed

inco

nsis

tent

ly?

Yes

Abbr

evia

tion

sNo

Go

to F

6

F6

Is te

xt in

cons

iste

nt w

ith g

raph

ics?

Yes

Imag

e vs

. tex

tNo

Go

to F

7

F7

Is th

e di

scou

rse

stru

ctur

e of

the

cont

ent i

ncon

sist

ent?

Yes

Disc

ours

eNo

Go

to F

8

F8

Is te

rmin

olog

y in

cons

iste

nt w

ithin

th

e co

nten

t (w

ithou

t bei

ng a

mis

-tr

ansl

atio

n)?

Yes

Term

inol

ogic

al in

cons

iste

ncy

NoG

o to

F9

F9

Are

cro

ss-r

efer

ence

s or

link

s in

cons

iste

nt in

wha

t the

y po

int t

o?

Yes

Inco

nsis

tent

link

/cro

ss-re

fere

nce

NoIn

cons

iste

ncy

F10

Doe

s th

e co

nten

t use

uni

diom

atic

ex

pres

sion

s?

Yes

Unid

iom

atic

NoG

o to

F11

F11

Is c

onte

nt in

appr

opria

tely

du

plic

ated

?

Yes

Dupl

icat

ion

NoG

o to

F12

F12

Is th

e w

rong

term

use

d? (G

ener

ally

as

sess

ed fo

r sou

rce

text

onl

y)

Yes

Go

to F

13No

Go

to F

14

F13

Is th

e te

rm u

sed

cont

rary

to

guid

elin

es e

stab

lishe

d in

a

norm

ativ

e do

cum

ent (

e.g.

, law

or

stan

dard

)?

Yes

Mon

olin

gual

nor

mat

ive

term

inol

ogy

NoM

onol

ingu

al te

rmin

olog

y

F14

Is th

e co

nten

t am

bigu

ous?

Yes

Go

to F

15No

Go

to F

16

F15

Is a

pro

noun

or o

ther

ling

uist

ical

ly

refe

rent

ial s

truc

ture

unc

lear

as

to it

s re

fere

nce/

ante

cede

nt?

Yes

Uncl

ear r

efer

ence

NoAm

bigu

ity

F16

Is c

onte

nt s

pelle

d in

corr

ectly

(in

clud

ing

inco

rrec

t cap

italiz

atio

n)?

Yes

Go

to F

17No

Go

to F

19

F17

Is c

onte

nt c

apita

lized

inco

rrec

tly?

Yes

Capi

taliz

atio

nNo

Go

to F

18

F18

Are

dia

criti

cs (e

.g.,

¨, ´,

˝, ˜)

mis

sing

or

inco

rrec

t?

Yes

Diac

ritic

sNo

Spel

ling

F19

Doe

s th

e co

nten

t vio

late

a fo

rmal

st

yle

guid

e (e

.g.,

Chic

ago

Man

ual o

f St

yle

or o

rgan

izat

ion

styl

e gu

ide)

?

Yes

Go

to F

20No

Go

to F

22

F20

Is th

e vi

olat

ion

spec

ific

to a

co

mpa

ny/o

rgan

izat

ion’

s in

tern

al/

hous

e st

yle

guid

e?

Yes

Com

pany

styl

eNo

Go

to F

21

F21

Is th

e vi

olat

ion

of a

third

-par

ty

styl

e gu

ide

(e.g

. Chi

cago

Man

ual

of S

tyle

, Am

eric

an P

sych

olog

ical

A

ssoc

iatio

n)?

Yes

3rd-

part

y sty

leNo

Styl

e gu

ide

F22

Doe

s th

e co

nten

t dis

play

pro

blem

s w

ith ty

pogr

aphy

(spa

cing

or

punc

tuat

ion)

Yes

Com

pany

styl

eNo

Go

to F

26

F23

Are

quo

te m

arks

or b

rack

ets

unpa

ired

(i.e.

, one

of a

pai

red

set o

f pu

nctu

atio

n is

mis

sing

)?

Yes

Unpa

ired

quot

e m

arks

or b

rack

ets

NoG

o to

F24

F24

Is p

unct

uatio

n us

ed in

corr

ectly

?

Yes

Punc

tuat

ion

NoG

o to

F25

F25

Is w

hite

spac

e us

ed in

corr

ectly

(i.e

., m

issi

ng, e

xtra

, inc

onsi

sten

t)?

Yes

Whi

tesp

ace

NoTy

pogr

aphy

F26

Is th

e co

nten

t gra

mm

atic

ally

in

corr

ect?

Yes

Go

to F

27No

Go

to F

33

F27

Is a

n in

corr

ect f

orm

of a

wor

d us

ed?

Yes

Go

to F

28No

Go

to F

31

F28

Is th

e w

rong

par

t of s

peec

h us

ed?

Yes

Part

of s

peec

hNo

Go

to F

29

F29

Doe

s th

e co

nten

t sho

w p

robl

ems

with

agr

eem

ent (

num

ber,

gend

er,

case

, etc

.)?

Yes

Agre

emen

tNo

Go

to F

30

F30

Doe

s th

e co

nten

t use

an

inco

rrec

t ve

rbal

tens

e, m

ood,

or a

spec

t?

Yes

Tens

e/m

ood/

aspe

ctNo

Wor

d fo

rm

F31

Are

wor

ds in

the

wro

ng o

rder

?

Yes

Wor

d or

der

NoG

o to

F32

F32

Are

func

tions

wor

ds (s

uch

as a

rtic

les,

“hel

per v

erbs

”, or

pre

posi

tions

) use

d in

corr

ectly

?

Yes

Func

tion

wor

dsNo

Gram

mar

F33

Doe

s th

e co

nten

t vio

late

loca

le-

spec

ific

conv

entio

ns (i

.e.,

it is

fine

for

the

lang

uage

, but

not

for t

he ta

rget

lo

cale

)?

Yes

Go

to F

34No

Go

to F

40

F34

Are

dat

es s

how

n in

the

wro

ng

form

at fo

r the

targ

et lo

cale

(e.g

., D

-M-Y

whe

n Y-

M-D

is e

xpec

ted)

?

Yes

Date

form

atNo

Go

to F

35

F35

Are

tim

es in

the

wro

ng fo

rmat

for

the

targ

et lo

cale

(e.g

., A

M/P

M w

hen

24-h

our t

ime

is e

xpec

ted)

?

Yes

Tim

e fo

rmat

NoG

o to

F36

F36

Are

mea

sure

men

ts in

the

wro

ng

form

at fo

r the

targ

et lo

cale

(e.g

., m

etric

uni

ts u

sed

whe

n Im

peria

l are

ex

pect

ed)?

Yes

Mea

sure

men

t for

mat

NoG

o to

F37

F37

Are

num

bers

form

atte

d in

corr

ectly

fo

r the

targ

et lo

cale

(e.g

., co

mm

a us

ed a

s th

ousa

nds

sepa

rato

r whe

n a

dot i

s ex

pect

ed)?

Yes

Num

ber f

orm

atNo

Go

to F

38

F38

Doe

s th

e co

nten

t use

the

wro

ng

type

of q

uote

mar

k fo

r the

targ

et

loca

le (e

.g.,

sing

le q

uote

s w

hen

doub

le q

uote

s ar

e ex

pect

ed)?

Yes

Quot

e m

ark

type

NoG

o to

F39

F39

Doe

s th

e co

nten

t vio

late

any

re

leva

nt n

atio

nal l

angu

age

stan

dard

s (e

.g.,

usin

g di

sallo

wed

w

ords

from

ano

ther

loca

le)?

Yes

Nati

onal

lang

uage

stan

dard

NoLo

cale

conv

enti

on

F40

Doe

s th

e co

nten

t use

an

inco

rrec

t ch

arac

ter e

ncod

ing?

Yes

Char

acte

r enc

odin

gNo

Go

to F

41

F41

Doe

s th

e co

nten

t use

cha

ract

ers

that

are

not

allo

wed

acc

ordi

ng to

sp

ecifi

catio

ns?

Yes

Nona

llow

ed ch

arac

ters

NoG

o to

F42

F42

Doe

s th

e co

nten

t vio

late

a fo

rmal

pa

tter

n (e

.g.,

regu

lar e

xpre

ssio

n)

that

defi

nes

wha

t the

con

tent

may

co

ntai

n?

Yes

Patt

ern

prob

lem

NoG

o to

F43

F43

Is c

onte

nt s

orte

d in

corr

ectly

for t

he

targ

et lo

cale

and

sor

ting

type

?

Yes

Sort

ing

NoG

o to

F44

F44

Is th

e co

nten

t inc

onsi

sten

t with

a

corp

us o

f kno

wn-

good

con

tent

? (N

ote:

Alm

ost a

lway

s de

term

ined

by

a co

mpu

ter p

rogr

am.)

Yes

Corp

us co

nfor

man

ceNo

Go

to F

45

F45

Are

link

s or

cro

ss-r

efer

ence

s br

oken

or

inac

cura

te?

Yes

Go

to F

46No

Go

to F

47

F46

Are

inte

rnal

link

s or

cro

ss-r

efer

ence

s br

oken

or i

nacc

urat

e?

Yes

Docu

men

t-in

tern

alNo

Docu

men

t-ex

tern

al

F47

Are

ther

e pr

oble

ms

with

an

inde

x or

Ta

ble

of C

onte

nt (T

oC)?

Yes

Go

to F

48No

Go

to F

51

F48

Are

pag

e re

fere

nces

in a

n in

dex

or

Tabl

e of

Con

tent

(ToC

) inc

orre

ct?

Yes

Page

refe

renc

esNo

Go

to F

49

F49

Is th

e fo

rmat

of a

n in

dex

or T

able

of

Cont

ent (

ToC)

inco

rrec

t?

Yes

Inde

x/TO

C for

mat

NoG

o to

F50

F50

Are

item

s m

issi

ng fr

om a

n in

dex

or

Tabl

e of

Con

tent

(ToC

)?

Yes

Mis

sing

/inco

rrec

t ite

mNo

Inde

x/TO

C

F51

Is c

onte

nt u

nint

ellig

ible

(i.e

., th

e flu

ency

is b

ad e

noug

h th

at th

e na

ture

of t

he p

robl

em c

anno

t be

dete

rmin

ed)?

Yes

Unin

telli

gibl

eNo

Flue

ncy

V1

Is th

e co

nten

t uns

uita

ble

for t

he

end-

user

(tar

get a

udie

nce)

?

Yes

End-

user

suit

abili

tyNo

Go

to V

2

V2

Is th

e co

nten

t inc

ompl

ete

or m

issi

ng

need

ed in

form

atio

n?

Yes

Go

to V

3No

Go

to V

5

V3

Are

list

s w

ithin

the

cont

ent

inco

mpl

ete

or m

issi

ng n

eede

d in

form

atio

n?

Yes

List

sNo

Go

to V

4

V4

Are

pro

cedu

res

desc

ribed

with

in

the

cont

ent i

ncom

plet

e or

mis

sing

ne

eded

info

rmat

ion?

Yes

Proc

edur

esNo

Com

plet

enes

s V5

Doe

s th

e co

nten

t vio

late

any

lega

l re

quire

men

ts fo

r the

targ

et lo

cale

or

inte

nded

aud

ienc

e?

Yes

Lega

l req

uire

men

tsNo

Go

to V

6

V6

Doe

s th

e co

nten

t ina

ppro

pria

tely

in

clud

e in

form

atio

n th

at d

oes

appl

y no

t to

the

targ

et lo

cale

or t

hat i

s ot

herw

ise

inac

cura

te fo

r it?

Yes

Loca

le-s

peci

fic co

nten

tNo

Verit

y

D1

Doe

s th

e fo

rmat

ting

issu

e ap

ply

glob

ally

to th

e en

tire

docu

men

t?

Yes

Go

to D

2No

Go

to D

8

D2

Are

col

ors

used

inco

rrec

tly?

Yes

Colo

rNo

Go

to D

3

D3

Is th

e ov

eral

l fon

t cho

ice

inco

rrec

t

Yes

Glob

al fo

nt ch

oice

NoG

o to

D4

D4

Are

foot

note

s/en

dnot

es fo

rmat

ted

inco

rrec

tly?

Yes

Foot

note

/end

note

form

atNo

Go

to D

5

D5

Are

mar

gins

for t

he d

ocum

ent

inco

rrec

t?

Yes

Mar

gins

NoG

o to

D6

D6

Are

wid

ows/

orph

ans

pres

ent i

n th

e co

nten

t?

Yes

Wid

ows/

orph

ans

NoG

o to

D7

D7

Are

ther

e im

prop

er p

age

brea

ks?

Yes

Page

bre

akNo

Over

all d

esig

n (la

yout

)

D8

Is lo

cal f

orm

attin

g (w

ithin

con

tent

) in

corr

ect?

Yes

Go

to D

9No

Go

to D

17

D9

Is te

xt a

ligne

d in

corr

ectly

?

Yes

Text

alig

nmen

tNo

Go

to D

10

D10

Are

par

agra

phs

inde

nted

impr

oper

ly

or n

ot in

dent

ed w

hen

they

sho

uld

be?

Yes

Para

grap

h in

dent

atio

nNo

Go

to D

11

D11

Are

font

s us

ed in

corr

ectly

with

in

cont

ent (

rath

er th

an g

loba

lly)?

Yes

Go

to D

12No

Go

to D

15

D12

Are

bol

d or

ital

ic u

sed

inco

rrec

tly?

Yes

Bold

/ital

icNo

Go

to D

13

D13

Is a

wro

ng fo

nt s

ize

used

?

Yes

Wro

ng si

zeNo

Go

to D

14

D14

Are

sin

gle-

wid

th fo

nts

used

whe

n do

uble

-wid

th fo

nts

shou

ld b

e us

ed

(or v

ice

vers

a)?

(App

lies t

o CJ

K te

xt o

nly.

)

Yes

Sing

le/d

oubl

e-w

idth

NoFo

nt

D15

Is te

xt k

erni

ng (s

pace

bet

wee

n le

tter

s) in

corr

ect (

text

too

tight

/too

lo

ose)

?

Yes

Kern

ing

NoG

o to

D16

D16

Is th

e le

adin

g (li

ne s

paci

ng o

f tex

t)

inco

rrec

t (e.

g., d

oubl

e sp

acin

g w

hen

sing

le s

paci

ng is

exp

ecte

d)?

Yes

Lead

ing

NoLo

cal f

orm

atti

ng

D17

Is tr

ansl

ated

text

mis

sing

from

the

layo

ut (i

.e.,

it ha

s be

en tr

ansl

ated

bu

t is

not v

isib

le in

the

form

atte

d ve

rsio

n)?

Yes

Mis

sing

text

NoG

o to

D18

D18

Is m

arku

p (e

.g.,

form

attin

g co

des)

us

ed in

corr

ectly

or i

n a

tech

nica

lly

inva

lid fa

shio

n?

Yes

Go

to D

19No

Go

to D

24

D19

Is m

arku

p us

ed in

cons

iste

ntly

(e.g

., <i

> is

use

d in

som

e pl

aces

and

<e

m>

in o

ther

s)?

Yes

Inco

nsis

tent

mar

kup

NoG

o to

D20

D20

Doe

s m

arku

p ap

pear

in th

e w

rong

pl

ace

with

in c

onte

nt?

Yes

Mis

plac

ed m

arku

pNo

Go

to D

21

D21

Has

mar

kup

been

inap

prop

riate

ly

adde

d to

the

cont

ent?

Yes

Adde

d m

arku

pNo

Go

to D

22

D22

Is n

eede

d m

arku

p m

issi

ng fr

om th

e co

nten

t?

Yes

Mis

sing

mar

kup

NoG

o to

D23

D23

Doe

s m

arku

p ap

pear

to b

e in

corr

ect?

(Not

e: G

ener

ally

det

ecte

d by

com

pute

r pro

cess

es)

Yes

Mis

sing

mar

kup

NoM

arku

p

D24

Are

ther

e pr

oble

ms

with

gra

phic

an

d/or

tabl

es?

Yes

Go

to D

25No

Go

to D

28

D25

Are

gra

phic

s or

tabl

es p

ositi

oned

in

corr

ectly

on

the

page

or w

ith

resp

ect t

o su

rrou

ndin

g te

xt?

Yes

Posi

tion

NoG

o to

D26

D26

Are

gra

phic

s or

tabl

es m

issi

ng fr

om

the

text

?

Yes

Mis

sing

gra

phic

/tab

leNo

Go

to D

27

D27

Are

ther

e pr

oble

ms

with

cal

l-out

s or

ca

ptio

ns fo

r gra

phic

s or

tabl

es?

Yes

call-

outs

and

capt

ions

NoGr

aphi

cs a

nd ta

bles

D28

Are

por

tions

of t

ext i

nvis

ible

due

to

text

exp

ansi

on?

Yes

Trun

cati

on/t

ext e

xpan

sion

NoG

o to

D29

D29

Is te

xt lo

nger

than

is a

llow

ed (b

ut

rem

ains

vis

ible

)?

Yes

Trun

cati

on/t

ext e

xpan

sion

NoLe

ngth

Mul

tidim

ensio

nal Q

ualit

y Met

rics (

MQ

M):

Full

Dec

ision

Tre

e

http

://w

ww.

qt21

.eu

MQ

M d

efini

tion:

http

://qt

21.e

u/m

qm-d

efini

tion/

The M

ultid

imen

siona

l Qua

lity M

etric

s (M

QM

) Fra

mew

ork

prov

ides

a hi

erar

chic

al ca

tego

rizat

ion

of er

ror t

ypes

that

occ

ur in

tran

slate

d or

loca

lized

pro

duct

s. Ba

sed

on a

deta

iled

anal

ysis

of ex

istin

g tra

nsla

tion

qual

ity m

etric

s, it

prov

ides

a fle

xibl

e typ

olog

y of i

ssue t

ypes

that

can

be ap

plie

d to

anal

ytic

or h

olist

ic tr

ansla

tion

qual

ity ev

alua

tion

task

s. A

lthou

gh th

e ful

l MQ

M is

sue t

ree (

whi

ch, a

s of N

ovem

ber 2

014,

cont

ains

115

issu

e typ

es ca

tego

rized

into

five

maj

or b

ranc

hes)

is n

ot in

tend

ed to

be u

sed

in it

s ent

irety

for a

ny p

artic

ular

eval

u-at

ion

task

, thi

s ove

rvie

w ch

art p

rese

nts a

“dec

ision

tree

” sui

tabl

e for

sele

ctin

g an

issue

type

from

it. I

n pr

actic

al te

rms,

how

ever

, an

indi

vidu

al m

etric

wou

ld h

ave a

smal

ler d

ecisi

on tr

ee th

at co

vers

just

the i

ssue

s con

tain

ed in

that

met

ric.

To u

se th

e dec

ision

tree

star

t with

the fi

rst q

uesti

on an

d fo

llow

the a

ppro

pria

te an

swer

s unt

il a s

peci

fic is

sue t

ype i

s rea

ched

.

Gen

eral

1

Is th

e is

sue

rela

ted

to a

diff

eren

ce in

m

eani

ng b

etw

een

the

sour

ce a

nd

targ

et?

Yes

Go

to A

ccur

acy

NoG

o to

Gen

eral

2

Gen

eral

2

Is th

e is

sue

rela

ted

to th

e lin

guis

tic

or m

echa

nica

l for

mul

atio

n of

the

cont

ent

Yes

Go

to F

luen

cyNo

Go

to G

ener

al 3

Gen

eral

3

Is th

e is

sue

rela

ted

to th

e ap

prop

riate

ness

of t

he c

onte

nt fo

r the

ta

rget

aud

ienc

e or

loca

le (s

epar

ate

from

w

heth

er it

is tr

ansl

ated

cor

rect

y)?

Yes

Go

to V

erit

yNo

Go

to G

ener

al 4

Gen

eral

4

Is th

e is

sue

rela

ted

to th

e pr

esen

tatio

nal/d

ispl

ay a

spec

ts o

f th

e co

nten

t?

Yes

Go

to D

esig

nNo

Go

to G

ener

al 5

Gen

eral

5

Is th

e is

sue

rela

ted

to w

heth

er o

r no

t the

con

tent

was

set

up

prop

erly

to

sup

port

sub

sequ

ent t

rans

latio

n/ad

apta

tion?

Yes

Go

to In

tern

atio

naliz

atio

nNo

Go

to G

ener

al 6

Gen

eral

6

Is th

e is

sue

addr

esse

d in

the

Com

pata

bilit

y br

anch

Yes

Go

to C

ompa

tabi

lity

NoO

ther

Acc

ura

cy

Not

e: If

the

issue

is fo

und

in th

e C

ompa

tabi

lity

bran

ch it

is p

roba

bly

not a

tran

slatio

n pr

oduc

t iss

ue,

but i

nste

ad a

pro

cess

issu

e an

d w

ould

nor

mal

ly n

ot

be a

ddre

ssed

in M

QM

. If t

he is

sue

is O

ther

, it m

ay

not b

e a

tran

slatio

n-re

late

d iss

ue si

nce

tran

slatio

n-re

late

d iss

ues w

ould

nor

mal

ly fa

ll in

to o

ne o

f the

m

ajor

bra

nche

s.