Top Banner
How to Successfully Integrate Machine Translation in your Company Diego Bartolome @diegobartolome [email protected]
44

Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Jul 16, 2015

Download

Technology

tauyou
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

How to Successfully Integrate

Machine Translation in your Company

Diego Bartolome @[email protected]

Page 2: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

and others

Page 3: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

70+ clients

18 countries

~700 Million words in 2014

All language pairs

Page 4: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 5: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 6: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

performance demandedin high end markets

performance demanded in low end markets

sustaining technology

disruptive technology

Page 7: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Objectives for Machine Translation

Productivity gains

Direct cost reduction

Quality consistency

Page 8: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

New uses for Machine Translation

Multilingual customer support

Social Media monitoring

Applications enabled by Big Data

Internet of Everything /Internet of Things

Speech-to-Speech translation

Page 9: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: First Round

What is your experience with MT?

1. Quality Metrics

2. Cost reduction

3. Impact on Delivery Times

4. Feedback from Post-editors

5. Your Feelings

Page 10: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Learning about Machine Translation

https://www.taus.net/think-tank/reports/translate-reports/taus-translation-technology-landscape-report

https://www.taus.net/think-tank/reports/translate-reports/moses-mt-market-report

http://www.lt-innovate.eu/resources/document/lt-20-13

http://www.gala-global.org/onDemand

Page 11: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Machine Translation Types

Page 12: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Google/Bing Translator vs. Moses

Advantages Big(gger) data

State-of-the-art technology

Learning curve

Disadvantages

Black-box

Confidentiality

Control

Page 13: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Internal vs. external

Core competence

Resources

ROI

Time to market

Page 14: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Costs of Machine Translation

Internal development – people and time

Free tools – Google + Bing

DOiY solutions

Traditional pricing model

tauyou managed solution

Page 15: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Revenue from Machine Translation

Translation as a Service

Private Machine Translation Portal

MT of internal communication (flat rate)

….

and many others!

Page 16: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: Round 21. Where do you provide value now?

2. Where do you think the value will be?

3. How important is confidentiality?

4. Do you care about control?

5. How much could you invest on MT?

(time, people, money)

6. When will your solution be available?

Page 17: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

On Language Quality (I)

Source: translate.autodesk.com

Page 18: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

On Language Quality (II)

Source: Philipp Koehn

Page 19: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Some Languages Sorted

From EN into

1) FR, ES, PT, IT

2) DE, NL, HE

3) ZH, JA, KR

4) RU, AR, TR, HI

Page 20: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

On Domain Quality

Who is willing to pay?

Where does your revenue come from?

What are your key skills?

What domains achieve good quality?

… Quality Order of your domains ...

Page 21: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: Round 31. What is your main motivation?

2. Can you try more than 1 domain?

3. Can you train at least 2 language pairs?

4. Can you pilot several MT vendors?

5. What are your current expectations?

Page 22: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Data acquisition

OPUS corpora

http://opus.lingfil.uu.se/

WMT workshops

e.g. http://www.statmt.org/wmt13/

Multilingual websites

TAUS

Page 23: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Corpora building

Related vs. unrelated materials

Percentage of out-of-domain

Does mono-lingual data help?

Corpora extension with linguistic processing

Ad-hoc corpus for file translation

The more, the better?

Page 24: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Data cleaning

Clean translation memories

Length, punctuation, terminology, …

Inconsistencies, repetitions, ...

Segment splitting

Optimize weight of most frequent n-grams

Validate their translations

Add out-of-domain data (optimization)

Page 25: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Remark

Data cleaning and selection is a key process

Just more data may harm the quality

Page 26: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Training strategies

One single system with all TMs

+ glossaries

+ linguistic processing input/output

+ forbidden words lists

Layered approach

Generic domain subdomain client→ → →

Page 27: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Models optimization

Filter the translation tables

Remove the garbage + tune weights

Optimize language models

Adapt them to the translation purpose

Tune parameters correctly

Tune set, test set, optimization parameters

Improve tokenization, recasing, ...

Page 28: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Workflow integration

Use MT as a secondary TM

Bilingual pre-translated translation files

CAT tool integration

Differentiated workflow

Page 29: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Continuous improvement

Qualitative

Use updated TMs in new trainings

Immediate (incremental) retraining

Rule-based automatic post-editing

Selective pre- and/or post-processing

Source content optimization

Page 30: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Linguistic processing notes

In the source and/or target language

Grammar checking

Entities detection

Proper nouns, alphanumeric words, ...

Compound words splitting

Sentence reordering

Page 31: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: Round 4

What is your preferred option?

How much can you invest in improvements?

Page 32: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

The Post-editor profile

Do skills needed differ from translation?

Post-editing guidelines (TAUS)

Full vs. light post-editing

http://www.slideshare.net/TAUS/taus-mt-postediting-guidelines

Compensation

Page 33: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: Round 5

Do you have the right resources to start?

Page 34: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Quality Metrics

SMT metrics: BLEU, NIST

Feedback from translators

Translation time vs. Post-editing time

Word Error Rate (WER) or Edit Distance

Cost reduction

Page 35: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Questions: Round 6

Are you able to measure?

Page 36: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 37: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Once upon an industry ...

Page 38: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 39: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 40: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 41: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 42: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 43: Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Page 44: Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Change before you

have to Jack Welch