Foundations Machine Translation Post-Editing Copyright: Welocalize, Inc. 2014. All Rights Reserved
Jan 13, 2015
FoundationsMachine TranslationPost-Editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
machine.translation
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
machine.translation• Contracts• Patents• Annual Reports• Light Marketing• Software Documentation• Software User Interface• SEO (Search Engine Optimization)• e-Learning Content • User Guides• Internal Corporate Communications• Wikis• Knowledge Bases• Proposals / Draft Applications• User Generated Content
Different use cases for MT
(audience? perishability?
visibility?)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
why.mtFor clients– Increase throughputs and consistency– Reduce cost of translation– Content explosion due to Internet– Most internet content is in English (user community is global)– Desire to translate also “lower quality” content, such as User Generated Content (UGC) at a profitable price– Quality of MT has improved (new technologies, lots of research)
For the translator– Increase throughputs and consistency– MT is likely to become commonplace, like TMs before– More & more clients and LSPs use MT– Be an early-adopter– MT and new forms of post-editing requirements are fast evolving
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
basic.conceptsMT in a nutshell
[…] Machine Translation provides a set of tools by which digital text is automatically translated from one language (e.g. English) into another language (e.g. Spanish).
Source: Systran user guide
There are 3 main types of MT systems with different underlying logics:
Rules-based (RBMT) Statistical (SMT) Hybrid (SMT + RBMT)
Most systems used today are either statistical or hybrid. All system types can be customized for specific clients, incorporating client Translation Memories, basic preferences and/or terminology lists.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
Client-specific dataTMs, glossaries
Domain-specific datachemistry or mechanical
or IT or…
General language dataanything to“teach the system the
basics on the language pair“, so all of: tourism, IT, automotive, literature,…
e.g. Google Translate and Bing would be Baseline
only
Customizable MT
systems(licensed or
open source)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
basic.conceptsUnderstanding statistical MT
For the translator, it is important to understand that SMT systems are based on algorithms calculating probabilities within a given set of data (bilingual and monolingual).
In other words, the system learns from legacy human translations (Translation Memories in our case) and calculates probabilities of most likely translations from these, without applying linguistic rules as such.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
The logic behindstatistical
machine translation(SMT)
Imagine the TM(s) as aligned data corpus – example
ExampleTerminology
The term click appears > 16 000 times in TM A
In 90% of cases it is translated with fare clicin 10% as: selezionare, scegliere, …
The probability is high, that the machine translation will be fare clic
…BUT, maybe…The string click OK appears 500 times in TM A
In 50% of cases it is translated with fare clic su OKin 50% as: selezionare OK
The probability is 50%, that the machine translation will be selezionare OK
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
typical.examples
good > perfect to overall understandable and fairly fluent
medium > contains useful chunks, terms and occasionally perfect output; more or less understandable, little fluency
poor > poor with regard to understandability and fluency
We carry out content evaluations to prevent content with overall poor MT output from going into production
Medium is the broadest category and can still lead to productivity gains when used as a basis for post-editing
The quality of raw MT output can vary. A distinction is typically made as follows:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
typical.examples
The quality of raw MT output can vary. Example:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
typical.examplesKnow the patterns of MT output
Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.:
– issues around capitalization
– punctuation (source punctuation is copied)
– spacing
– omissions/additions of text (usually different in nature to those in fuzzy matches)
– unknown/new words may be translated literally or be left in English
– word order: can be mirroring the source
– compound formation
– word form agreement
→ being aware of typical issues helps good post-editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
post.editing
What is Post-Editing?
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
post.editingIn other words…
Make changes where necessary, using as much of the MT output as possible
(based on language and client requirements)
Read the MT output & the source > decide quickly what can be used
Use as many “bits/sections“ of the MT output as possible: move them around, correct word forms, change the part of speech, use them as inspiration
Look up key terms in your reference material as usual, but also learn to trust the customized output
Automate with customized QA checks
Adjust your expectations. Rethink your approach. Report recurring errors.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
full.post.editingfull post-editing: “publishable quality”
► Client Glossary, TM, Style Guide and others applyExamples:
infinitive / imperative preferences? passive / impassive preferences? formal / informal preferences? different styles for headers, lists, tables? special formatting of UI options? (bilingual, English) are measurements to be converted? Terminology
If the client requests “full post-editing”, this means publishable quality.
The post-editor is responsible for ensuring the client requirements with regardto final quality expectations are met.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
light.post.editinglight post-editing / “understandable quality”
Full Post-Editing Light Post-Editing
Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable
Terminology is accurate & consistent Terminology is understandable and actionable
Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable
Style is consistent (headers, list items,…) Style variations are acceptable
Punctuation is correct Variations/errors in punctuation are acceptable
Style & tone are appropriate for content Style & tone are not offensive
Specific requirements: 33 cm (13‘‘); change EN quotation marks to FR/DE/….
Follow MT output, e.g. keep proposed number format 13‘‘ (33cm), English quotation marks,...
… …
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
post.editinglight post-editing versus full post-editing
*Copyright CSA
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
post.editingNotes on productivity
Just as with human translation, throughput can vary and depends on:
– language pair– content type & complexity– experience– domain knowledge– quality requirements– use of automatic QA tools– quality of TM and reference material
With MT, additional factors are:– quality of the MT– experience with post-editing
Compared to average daily throughputs for human translation, average daily throughputs for full post-editing can be up to 3 x higher.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
take.aways
There are different use-cases of MT associated with different levels of final (post-edited) quality
When full PE is requested, this means publishable quality There are different MT systems, Welocalize works with a range of
them MT output varies in quality, we evaluate it with our translation
partners to ensure the necessary quality for post-editing is met MT is not expected to be perfect, that‘s why we need post-editors! Post-editing replaces the translation stage in the workflow, but it is a
different task, cognitively MT systems can improve through adding more data & through
constructive feedback from post-editors
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.
trademark.disclaimer:Product names, logos, brands and other trademarks referenced within this presentation are the property of their respective trademark holders. These trademark holders are not owned or affiliated to Welocalize, Inc., our products, or our website. They do not sponsor or endorse our materials. Reference is for education purposes only.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Questions?Contact the Welocalize Language Tools [email protected], [email protected]
WelocalizeFrederick, Maryland - Headquarters
241 East 4th St. Suite 207Frederick, Maryland 21701 USA
[t] +1.301.668.0330[t] +1.800.370.9515 Toll Free
www.welocalize.com
Copyright: Welocalize, Inc. 2014. All Rights Reserved