ADVANTAGES AND DISADVANTAGES OF …kitt2.ifi.uzh.ch/clab/satzaehnlichkeit/tutorial/Unterlagen/Webb... · memory software manufacturers, especially TRADOS Corporation, for their input
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY:
A COST/BENEFIT ANALYSIS
by
Lynn E. Webb
BA, San Francisco State University, 1992
Submitted in partial satisfaction of the requirements for
4 TEXTS THAT ARE CONDUCIVE TO USING TRANSLATION MEMORY 14
REUSABILITY 144.1.1 UPDATES 144.1.2 REVISIONS 154.1.3 “RECYCLING” PRIOR WORK 16
REPETITIVE CONTENT 16
5 KEY CONSIDERATIONS FOR DETERMINING THE COST-EFFECTIVENESS OF TRANSLATIONMEMORY 18
5.1 TYPE OF PROJECT: INDIVIDUAL VS. TEAM 195.2 PERCENTAGE OF WORK IN HARD COPY VS. ELECTRONIC FORMAT 195.3 TYPE OF TEXT USUALLY TRANSLATED 205.4 TIME REQUIRED FOR CONVENTIONAL TRANSLATION PROCESS VS. TM PROCESS 205.5 COMPARING RATES 215.6 NEED TO INTEGRATE PRIOR WORK FOR PRESENT PROJECTS (ALIGNMENT) 235.7 NEED TO INTEGRATE PRESENT WORK FOR FUTURE PROJECTS 235.8 FREQUENCY OF UPDATES AND REVISIONS 23
6 EXAMPLES 24
6.1 INITIAL INVESTMENT 256.2 THE CLIENT 266.3 THE TRANSLATION AGENCY 306.4 THE FREELANCE TRANSLATOR 346.5 COMPANIES WITH IN-HOUSE TRANSLATION DIVISIONS 37
7 SURVEY/CASE STUDIES 40
7.1 SURVEY 407.2 CASE STUDIES 47
8. TM DATABASE OWNERSHIP 47
9 DRAWBACKS OF TRANSLATION MEMORY 49
10 TRANSLATION MEMORY PRODUCTS 50
10.1 STANDARD TRANSLATION MEMORY SOFTWARE 5110.2 LOCALIZATION SOFTWARE WITH TM 5110.3 TM/MT HYBRIDS 52
11 FINDING COMMON GROUND 52
12 FUTURE TRENDS 53
TABLE OF CONTENTS
3
13 CONCLUSION 56
REFERENCES 57
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
4
ACKNOWLEDGEMENTS
When I first considered writing my thesis on translation memory, I wasn’t exactly
sure on what aspect I should focus. I would like to thank Chris Langewis for helping me
to narrow my topics down to the one presented in this thesis. I would also like to thank
him for his valuable input as my thesis advisor and as an expert in the field.
My survey would not have been a success without responses from the members
of LANTRA-L, CompuServe’s Foreign Language Education Forum (FLEFO), Interlang
and from fellow translators who responded to the various personal e-mails that I sent
out. Gerald Dennett, Mark Berry and Jeff Allen all deserve a special thanks for
contributing valuable information. I would also like to thank the various translation
memory software manufacturers, especially TRADOS Corporation, for their input and
responses to my questions.
Finally, I would like to thank William Webb for reading and editing and David
Sawyer and Frank Austermühl for reading and approving my final thesis. After all, if it
weren’t for this audience, I wouldn’t be able to share it with the wider audience—all of
you.
The figures and tables that are not displayed in this electronic document can be
found in the HTML pages (figures) and Excel spreadsheets (tables) included with this
document in the archive zip file.
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
5
1 INTRODUCTION
Many articles have been written about translation memory in the last few years.
Most of the material is provided by the producers of translation memory systems and
only covers a specific product or lists specific features of the technology. Some articles
even magnify the negative aspects of translation memory. Despite all that has been
written, not much has been said about the actual costs or potential savings involved
when using translation memory. It is becoming increasingly clear that translation
memory is here to stay and that it is serving a useful purpose, but just exactly how is
this technology affecting the translation industry? Who can profit from it? Are there any
"losers?” This thesis will attempt to determine the applicability of translation memory
technology and illustrate the advantages and disadvantages of translation memory in
the form of a cost/benefit analysis from the point of view of the end-user.
For the purpose of this thesis, the “end-user” comprises freelance translators,
companies with in-house translation divisions, translation agencies and direct clients.
The key considerations for determining the cost-effectiveness of translation memory
and the cost/benefit analysis will be covered later in this text.
2 TRANSLATION MEMORY DEFINED
What is translation memory? Translation memory (TM) is defined by the Expert
Advisory Group on Language Engineering Standards (EAGLES) Evaluation Working
Group's document on the evaluation of natural language processing systems as “a
multilingual text archive containing (segmented, aligned, parsed and classified)
multilingual texts, allowing storage and retrieval of aligned multilingual text segments
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
6
against various search conditions.”1 In other words, translation memory (also known as
sentence memory) consists of a database that stores source and target language pairs
of text segments that can be retrieved for use with present texts and texts to be
translated in the future. The translator, a different translation memory system or a
machine translation system provide the target text segments that are paired with the
source text segments so that the end product is a quality translation.
What distinguishes TM from other computer-assisted translation (CAT) tools?
There are many CAT tools available to assist the translator, such as bilingual and
multilingual dictionaries, grammar and spell checkers and terminology software, but TM
goes one step further by making use of these other CAT tools while at the same
matching up the original source document stored in its database with the updated or
revised document through exact and fuzzy matching. Normally, the basic unit of text in
a TM database is a sentence; however, the TM user can define what the unit will be.
The basic unit might even be a sentence fragment or a paragraph. The translator does
not have to re-translate work he or she has already completed. Figure 1 illustrates the
basic translation memory process for creating a target language translation.
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
7
How does TM differ from machine translation (MT)? MT creates automated
translations and requires an advanced terminology database that includes all
grammatical elements of a language. The MT system uses comprehensive dictionaries
to translate the source text while at the same time applying the grammatical rules, or
rule sets, from the database in order to produce the resulting grammatically correct
target sentences. The technology sounds like an excellent solution; however, there is a
catch: the source and resulting target text segments are not stored away in a database
for future use. If a similar text (such as an automobile user’s manual for the same
model but different year) needs to be translated, the MT system would have to start
from scratch. On the other hand, a TM system is used as a translator’s aid, storing a
human translator’s text in a database for future use. TM can be used a few different
ways. One way would be to have a translator or a machine translation system translate
the original text, using translation memory to store the paired source and target
segments. The translator could then reuse the stored texts to create the revised or
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
8
updated version of the text. Only the segments of the new text that do not match the old
one would have to be translated. The alternative would be to use an MT system or a
different TM system to translate the original. The new TM system could then be used by
a translator to translate the revision or update by aligning the texts produced by the MT
system or other TM system and storing them in the TM database for present and future
work. The translator could then proceed to translate only the segments of the new text,
using TM as described above.
3 THE EFFECTS OF TRANSLATION MEMORY ON THETRANSLATION PROCESS
3.1 THE TRANSLATION PROCESS
How does TM affect the conventional translation process? In order to answer this
question, we must have an idea of what takes place during this process. This section
will briefly touch on the general translation process and the features of TM used in this
process.
Figures 2 and 3 illustrate the conventional translation process and the translation
process using TM respectively. If it is possible to analyze quickly what type of text one
is translating and if the text suits translation memory, there are about the same number
of steps involved in both processes. One of the greatest differences, however, is that
once a translation has been performed using TM, not only is there a glossary of terms
stored for future recall, individual sentences will also be stored, thus cutting down
considerably on the time required for a future translation, update or revision.
Figure 4 illustrates the conventional translation process for a text that is being
revised. Figure 5 illustrates the same text using TM. Note that when using TM, fuzzy
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
9
and exact matching are performed using the translation memory program, allowing for
quick access to sections that have changed and permitting the user to focus on
translating only those changed sections.
Exact matching is the process by which the TM program pairs text segments in a
revised source text that match the original source text exactly; however, any text in the
document that does not exactly match the original will not be translated. Fuzzy
matching is the process by which the TM program pairs text segments in a revised
source text with similar text segments from a previously stored translation based on the
original source text. Fuzzy matching will find segments that are very similar to the
original and suggest the original translation. This function can be set to different levels
of sensitivity, allowing the translator to “match” source text segments that may differ
only slightly or segments that vary greatly, but still have some similarities. After exact
and fuzzy matching, the translator can modify the remaining segments that reflect the
changes between the original and revised texts without having to retranslate the entire
document (see Figure 6).
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
10
Fig. 6: Exact and Fuzzy Matching
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
11
In addition to matching source text segments, fuzzy matching can also be used
to find terminology in the terminology database that is very similar to terminology being
used for a translation. For example, if the term “communicate” is in the terminology
database, the translation of “communicate” will be suggested whenever the terms
“communicated” or “communication” appear in the original text. The translator can then
enter the correct form of the word accordingly.
Although fuzzy matching is quite useful, the user must also be aware of
problems that may arise during post-editing of matched text segments. Gerald Dennett
explains in his thesis entitled “Translation Memory: Concepts, products, impact and
prospects”:
Take the German sentence pairs:1. “Ein Messer ist im Schrank. Er mißt Elektrizität.”2. “Ein Messer ist im Schrank. Es ist sehr scharf.”
Imagine that the translator has translated a document containing sentencepair 1 and has thus stored in his Translation Memory the two segments:
“A meter is in the cabinet.” And “It measures electricity.” The syntactical andcontextual information supplied by the second sentence indicates to thetranslator that the word “Messer” here refers to a meter. The translator then runsa text containing sentence pair 2 through the pre-translation routine in hisTranslation Memory software. The Translation Memory software will recognise a100% match in the first part of the pair, and insert “A meter is in the cabinet.” inthe translation. A human translator would immediately realise from thesyntactical and contextual information supplied in the second part of the pair thathere in German word “Messer” is of neuter gender, and hence means “knife”.The translator must hope that he can pick up such mistranslations in his proof-reading.2
On the other hand, the likelihood that the above sentences would appear in the same
document is probably quite low, especially since they would probably be used in
completely different domains or in different types of text.
2 Dennett 33.
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
12
The alignment tool is an example of a CAT tool that is almost indispensable
when initially integrating older translations into TM. Using the alignment tool with TM
can save the user time on future projects. Figure 7 illustrates the process of alignment.
Alignment involves matching the electronic source/target texts by aligning matching
source/target text segments. The translator is essentially building a TM database that is
identical to a normal TM database built during the translation process. This process is
performed when it is clear that the source text will be revised or updated in the future
but was originally translated using the conventional translation process. If the source or
target texts are in hard copy, one should seriously consider the likelihood of whether or
not the text will require future updates or revisions before performing an alignment.
3.2 MANAGING THE TRANSLATION PROCESS
Perhaps the most intriguing aspect of translation memory is its ability to aid the
user in managing projects, coordinating team efforts and building glossaries and
dictionaries. Following are some additional features of TM that allow the translator or
other user to manage translation projects more efficiently.
3.2.1 INTERNAL ATTRIBUTES
Most TM products not only store language pairs; they also store other
information, called attributes, with the pairs. The most common attributes stored include
the creation date, the name of the user or creator, the client, the project ID and the
main domain or field (e.g., legal, technical, etc.) of the translation. Once this information
is stored with the translated segments, the translator or other user can filter the text for
the most important attributes. For example, the user can look for similar text segments
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
13
by project, client, etc. when performing fuzzy matching, or a project manager may have
more control over accountability for translated texts by filtering for creation date or the
name of the creator of the translated segments. The latter is particularly useful when a
number of translators are working on one large project, especially when the translators
are all working with the same language pair.
3.2.2 TERMINOLOGY DATABASES
Most TM products come with a terminology database so that the translator can
take full advantage of all of the features of TM. Using an integrated terminology
database allows a translator to perform fuzzy matching for a specific term or to use a
term in the database suggested by TM. Without a terminology database that is
compatible with translation memory, the TM user cannot easily obtain suggested
translations for individual words without opening a separate electronic dictionary or
looking through a conventional dictionary. Naturally, the user must enter the
terminology into the database before it can be useful. Once the terms are in the
database, however, an individual translator or team of translators can work on a project
and receive the suggested terms from the database, maintaining terminological
consistency throughout the translation.
3.2.3 ANALYSIS
The ability to estimate in advance approximately how much time a project will
take is not always an easy task. If the translation memory system is a good one, it will
have the capability of analyzing a document for similar sentences and text repetition. It
will also provide raw word counts, ignoring elements like graphics, HTML tags, software
code, etc. that could influence the count. This analysis makes it easier for the translator
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
14
or project manager to assess whether or not translation memory will be useful for the
project and also helps him or her determine how much time may be involved in
translating the document, depending on the amount of repetition, the word count, etc.
The user may also use the analysis function to compare different documents for
similarities. Analysis can reveal if one document that has been translated previously
and a newer document are in any way similar. Depending on how similar the two
documents are, the user can estimate the time required for translation.
4 TEXTS THAT ARE CONDUCIVE TO USING TRANSLATIONMEMORY
REUSABILITY
The most important characteristic of a text that is conducive to translation
memory is that the text will be reused in one way or another. Following are examples of
how texts can be reused and how translation memory becomes involved in the process.
4.1.1 UPDATES
A not uncommon occurrence during the translation process is when an update of
the text being translated is suddenly made available to the translator. An update is a
change in a source text that occurs while the translation is still in progress. Receiving an
updated text can cause major difficulties for the translator if the text is large and
changes have been made throughout the entire document. Figures 3 and 4 illustrate
the update/revision process with and without translation memory. Making updates using
translation memory has the advantage over the conventional update process in that the
translator does not have to physically search through the entire document for changes.
Instead, the translator only has to run the updated source text through the translation
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
15
memory program to identify new or changed segments and any new terminology. New
terminology can be entered into the terminology database by the translator for future
use.
Keep in mind that in order for translation memory to be effective, all work must
be done in TM and saved in TM format. Anything done outside of TM will not be stored
in the memory database and therefore will not be a translation that can be manipulated
in the future, unless one has access to an alignment tool. The best way to approach TM
is to think about it as being an integral part of the main word processor, just like the
word processor’s spell checker. If the TM system is a stand-alone product, always keep
a copy of the text file that retains the TM product’s file format.
A translator can even begin the translation process before the final original
document is completed. If the translator is given drafts of the original document in its
early stages of development, the text can be translated and stored in the TM database.
Then, as updated sections of the text are made available, the translator can perform
fuzzy and exact matching, thus isolating the new parts from the parts that have already
been translated or that are similar to the original. Section 6.5 is an example of this
process.
4.1.2 REVISIONS
Many translators find that they continually receive revisions from the same
clients. A revision is a new project amending a prior translation, reflecting changes
made to a prior source text. Often a translator is asked by a client to revise the
translation of a manual for the current product model that will be released within a short
period of time. The client wants the translated manual to be available at the same time
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
16
that the product is launched on the market. If the translator were to use the
conventional translation process, it could take months before a very large document
would be ready, and the client might not have that much patience or time. If, however,
the translator uses translation memory, he or she can analyze what has changed within
the document and can provide the revised translation of the manual within a shorter
period of time than if he or she had used the conventional process. Section 6.4
illustrates this process.
4.1.3 “RECYCLING” PRIOR WORK
At times, a translator may find that he or she is translating a text very similar to
one that had been translated in the past. The translator may run across words or
phrases that are almost identical to words or phrases in the older document. The odds
that a translator will ever translate the same sentence twice in two different texts is very
low; however, the odds are higher that a translator will run across similar phrases or
words in texts within the same field and/or for the same client. If the translator has an
electronic copy of the target and source texts from the previous translation, then he or
she can quickly access the files and perform fuzzy matching with the new source text
against the old source and target texts.
REPETITIVE CONTENT
Another important factor is whether or not there is repetitive content within a text.
The higher the percentage of repetitive content within a text, the more desirable it is to
use translation memory. Repetitive content may include words, phrases or entire
paragraphs. There are a number of different text types, but some tend to have more
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
17
repetitive content than others. The majority of translatable texts fall into the following
categories3:
� Correspondence
� Journalism/Communication
� Business/Commercial
� Marketing
� Advertising
� Administration
� Legal
� Scientific
� Technical
� Culture
� Literature
The types of texts that are usually suited for translation memory are marked with
the "�" symbol. Interestingly, according to the Telecom Observer, “each year 450
million pages of scientific, technical, and commercial materials are translated world-
wide.”4 Some examples of the type of texts that fall into these categories include:
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
49
be developed and maintained. However, there are really tricky issues if [there is a]
mixing of company resources (which could be domain specific) and client-specific
[databases].”13
If a TM database has been created for a specific client, then it is best for the
translator to assume that the database is the client’s property, unless an agreement has
been made otherwise. This also means that the translator should not use this client’s
database on projects for other clients. The bottom line appears to be that the translator
and client need to agree on who owns the database by making it part of the business
contract.
9 DRAWBACKS OF TRANSLATION MEMORY
Translation memory, like every other good invention, is bound to have
drawbacks. Four particular issues come to mind:
1. Any post-editing of a translation cannot be easily integrated into a TM
database if performed outside of the database. If the translator creates a
draft translation in TM and then exports the document in a different
format, any corrections made to the exported document cannot be
captured by TM. This problem can be remedied by making all corrections
to the document within the TM system or by aligning the post-edited text
with the source text.
2. The tendency for a translator to create only one or two drafts of a
translation in TM is higher, possibly affecting the quality of the final
13 Plumley, “RE: translation memory ownership.”
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
50
translation. Many translators find that when they use TM, they are less
likely to “fine-tune” the translation. They may feel that the TM system is
less flexible than a word processing program. It may be that once the
translator uses the TM system more often, the less likely this will continue
to be an issue. It is also possible that these translators may tend to work
in fields that award speed over accuracy.
3. Post-editing of pre-translations performed in TM can be quite time-
consuming and may take longer than if the translator had used the
conventional translation process to translate and edit the document. This
issue is largely related to instances in which many text segments appear
to be exact or fuzzy matches, but in reality are expressing completely
different ideas (see Geral Dennett’s example in section 3.1).
4. Learning how to use the TM program thoroughly may take some time,
which the translator may not have, especially if the translator is borrowing
the TM program for a specific project. Suzanne Falcone comments:
One problem with these programs is that they usually come at thesame time as the first job, and the deadline for delivery obviouslydoesn’t take training into account. The translator rarely has time toread the user’s manual…, and sometimes can’t even install theprogram properly on the first (and even second) attempt… This is onepoint to take into account before accepting work with a program thatthe client is offering to provide.14
10 TRANSLATION MEMORY PRODUCTS
There are many different types of products on the market that feature translation
memory. Translation memory technology can be found as a product in and of itself or
[LANTRA-L] Internet Languages Translation mailing list.<[email protected]>.
Alis. “Alis Translator for Lotus Domino.”<http://www.alis.com/altd/index.html?AlisFramesTgtDoc> (10 Oct. 1998).
Allen, Jeff. <[email protected]> “ISSUE: hour vs. word rates for repetitive work.”(2 Sept. 1998) E-mail in response to my e-mail sent to<[email protected]>.
Allen, Jeff. <[email protected]> “ISSUE: transit software.” (17 Sept. 1998)Forwarded e-mail from his response to <[email protected]> (30May 1998) regarding Transit software.
Berry, Mark. Who Owns Translation Memory? MCB News No. 10, Fall, 1995.San Diego (CA): MCB Systems (1995).
Boitet, Christian. “Machine-aided Human Translation.” Survey of the State of the Art inHuman Language Technology. (21 November 1995) Grenoble (France).
ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY
58
Brigham Young University Translation Resource Group. ”Translation memory.”<http://humanities.byu.edu/trg/tm2.htm> (25 Aug. 1997).
ComStar. “Translation Memory.” Multilingual Software Digest.<http://www.gy.com/www/ww1/ww2/ibmt01.htm> (25 Aug. 1997).
Dennett, Gerald. Translation Memory: Concepts, products, impact and prospects.London (Enland): South Bank University, 1995
Di Biasio, Diane. <[email protected]> “Re: discount for repetitive content.”(31 Aug. 1998) E-mail in response to my e-mail sent to<[email protected]>.
EAGLES (Evaluation of Natural Language Processing Systems). “Benchmarkingtranslation memories.” Doc. EAG-EWG-PR.2<http://issco-www.unige.ch/ewg95> (Sept. 1995).
Everson, Andrene. Personal communication (3 Oct. 1998).
Falcone, Suzanne. “Translation Aid Software. Four Translation Memory ProgramsReviewed.” Translation Journal Jan. 1998<http://accurapid.com/journal/03tm2.htm> (1997).
Gesellschaft für Multilinguale Systeme GmbH. “Langenscheidt’s T1 Professional –Main Features.” <http://www.gmsmuc.de/english/t1/pfunc.html> (26 Sept. 1998).
Gordon, Ian and TRADOS UK Ltd. “Letting the CAT Out of the Bag – Or was it MT?”Kettleshulme (England) <http://www.trados.com> (21 Feb. 1997).