Top Banner
WeMT Tools and Processes TAUS Showcase October 2013 By Olga Beregovaya t © welocalize 2013. all rights reserved. www.welocalize.com
24

WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Nov 19, 2014

Download

Technology

Welocalize

WeMT Tools and Processes, a presentation by Olga Beregovaya at Localization World 2013 in Silicon Valley. Presented during TAUS Showcase. Discussion of automation and machine translation programs. Welocalize is the leader in localization and translation solutions.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

WeMT Tools and ProcessesTAUS Showcase October 2013By Olga Beregovaya

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 2: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

We’ll talk about:

• MT Programs• Metrics• Engines• Language Tools

www.welocalize.com

Page 3: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Current MT Programs

Dell – 27 languagesAutodesk – 11 languagesPayPal - 8 languagesCisco – 17 languages between 3 tiersIntuit – 20+languagesMicrosoft (pre-project support) McAfee (pilot) … many more in pilot stage

Page 4: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

MT Program: Path-to-Success Components

A set of MT engines – “mix and match”

TMT Selection Mechanisms

Post-editing Environment

Processes and metrics

Data gathering and reporting tool – what, how much, how fast and at what effort

EDUCATION EDUCATION EDUCATION

CHANGE

The recipe

for success

Page 5: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Process and Workflow

All aspects of the localization ecosystem are taken into consideration

Selecting the right MT engine

By using our MT engine selection Scorecard we make sure all important KPIs are taken into consideration at selection time

Empowerment through educationInternal, by the use of customized Toolkits; external, through specialised Trainings.

MT KPIs: Productivity: Throughputs Productivity: Delta Quality: LQA Quality: Automatic Scores Cost GlobalSight: Connectivity GlobalSight: Tagging Human Evaluation Customization: Internal/External Customization: Time The feedback loop

Constructive communication from post-editor to MT provider

Page 6: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

o Source content classification (i.e. marketing/UI/UA/UGC)o Length of the source segmento Source segment morpho-syntactic complexityo Presence/absence of pre-defined glossary terms or multi-word glossary

elements, UI elements, numeric variables, product lists, ‘do-not-translate’ and transliteration lists

o Tag density - Metadata attributes and their representation in localization industry standard formats (“tags”)

o ROC – quality levels based on content use (“impact”)

3D Model: Expected productivity mapped to desired quality levels and source content complexity

MT Program Design - Source

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 7: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Productivity - ThroughputsNumber of post-edited words per hour

Productivity - Delta Percentage difference between translation and post-

editing timeCost

Extrapolation, cost per wordCMS - Connectivity

Is there a connector in place?Quality/Nature of sourceQuality (Final) - LQA

Internal quality verificationQuality (MT) - Automatic Scores

A set of automatic scoring systems is used

MT Engine Selection Scorecard

We have tested and used different engines so we’ve seen the good, the bad and the ugly; now we can better appreciate what we have

Page 8: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Scorecard - Metrics

Overall data

KPIs # 1 # 2 # 3 # 4 KPIs # 1 # 2 # 3 # 4Productivity 4 4 4 4 Productivity 4 5 3 4Productivity Increase 5 4 1 3 Productivity Increase 5 5 1 4Quality - LQA 2 2 1 2 Quality - LQA 5 3 3 4Quality - Automatic Scores 3 3 3 3 Quality - Automatic Scores 3 4 3 3Cost 4 2 3 3 Cost 4 2 3 3GlobalSight - Connectivity 4 3 2 4 GlobalSight - Connectivity 4 3 2 4GlobalSight - Tagging 4 2 4 2 GlobalSight - Tagging 4 2 2 2Human Evaluation 3 3 3 4 Human Evaluation 3 3 3 3Customization - Internal/External 4 2 3 3 Customization - Internal/External 4 2 3 3Customization - Time 3 1 2 1 Customization - Time 3 1 2 1Total 36 26 26 29 Total 39 30 25 31

German French Productivity metrics

Automatic Scoring

Human Evaluation

Page 9: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Toolkits and Trainings

Our experience:

Most translators know and have experienced post-editing but they have limited knowledge of any other related aspect (automatic scoring, output differences between RBMT and SMT...)

The majority of people who work in localization have heard about MT but most of them still find it a daunting subject.

Our answer:

Continuous MT and PE related trainings and documentation for language providers

Customized Toolkits for different internal departments (Production, Quality, Sales, Vendor Management)

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 10: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Transparency and OwnershipTheory – knowledge foundations

Practice – customized PE sessions for different client accounts

Transparency – process, engine selection/customization, evaluations

Responsibility – valid evaluations, constructive feedback, quality ownership

Training  helps a lot - After I was told some of the background information and tips and tricks for certain engines/outputs, I was much more relaxed and happy to give MT a go.

Page 11: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Legacy data – best prediction tool > Statistics from legacy knowledge base

Page 12: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

The feedback loop

engine retraining improved significantly the handling of tags and spaces around tags, this is a productive achievement as it saves us a lot of manual corrections.

For me the biggest advantage would be

the possibility to implement a client terminology list [in

SMT]

I wish we could easily fix the corpus for

outdated terminology and characters

Teach the engine to properly cope with sentences containing more than one verb and/or verbs in progressive form

Page 13: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Feedback and Engine Improvement

Page 14: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

“Beyond the Engine” Tools

• Teaminology - crowdsourcing platform for centralized term governance; simultaneous concordance search of TMs and term bases => clean training data

• Dispatcher - A global community content translation application that connects user generated content (UGC) including live chats, social media, forums, comments and knowledge bases to customized machine translation (MT) engines for real-time translation

• Source Candidate Scorer – scoring of candidate sentences against historically good and bad sentences based on POS and perplexity

• Corpus Preparation Toolkit – set of application to maximize data preparation for MT engine training

Page 15: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Teaminology

Teaminology

Page 16: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Dispatcher

Page 17: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Source Candidate Scorer

Source Candidate

Scorer

Compares your source content to “the good” and “the bad” legacy segments and estimates potential suitability for MT

Page 18: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Corpus Preparation Suite

Variety of tools to prepare corpus for training MT engines such as:

• Deleting formatting tags from TMX• Removing double spaces• Removing duplicated punctuation (e.g. commas)• Deleting segments where source = target• Deleting segments containing only URLs• Escaping characters• Removing duplicate sentences

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 19: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Corpus Preparation: TM Creator

TM Creator

Aggregates training data from various relevant sources

Page 20: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Corpus Preparation: TMX Splitter

Extracts the relevant training corpus based on the TMX metadata

Page 21: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Welocalize Moses Implementation

• Why? Far more control over engine quality since we can control corpus preparation and output post-processing

• Control over metadata handling• Ties into our company open-source philosophy• Have experienced personnel in-house• Can extend and customize Moses functionality as necessary• Have connector to TMS (GlobalSight)

RESULTS: In our internal tests with Moses/DoMT, we are getting automated scores similar to commercial engines for the languages into which we localize most. Same feedback received from human evaluators

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 22: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

… And it works!

We are in the position to offer realistic discounts and aggressive timelines providing quality levels appropriate for the

content

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 23: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

“Work-in-progress” Projects

• Ongoing improvements to our adaptation of iOmegaT tool (Welocalize/CNGL)

• Industry Partner in CNGL “Source Content Profiler” project

• Adoption of TMTPrime (CNGL) - MT vs. Fuzzy Match selection mechanism

• Language and content-specific pre-processing for the in-house Moses deployment

• Teaminology – adding linguistic intelligence

copyright © welocalize 2013. all rights reserved. www.welocalize.com

Page 24: WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization World

Contact

[email protected] speak MT - the language of the future

Welocalize, Inc.www.welocalize.com Headquarters241 East 4th St. Suite 207Frederick, Maryland 21701 USA[t] +1.301.668.0330[t] +1.800.370.9515 Toll Free[f] +1.301.668.0335[e] [email protected]

copyright © welocalize 2013. all rights reserved. www.welocalize.com