Top Banner
www.cngl.ie Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo Porto and Dave Lewis CNGL Trinity College Dublin
32

Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

Feb 15, 2019

Download

Documents

hadiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Requirements for ITS2.0 support in Computer

Assisted Translation Tools

John Moran, Christian Saam, Anuar Serikov, Pablo Porto and Dave Lewis

CNGL

Trinity College Dublin

Page 2: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Overview

• Meta-Data and CAT Tools

• Use Cases: ITS2.0 and CAT tools

• Prototype: OmegaT

• Prototype: Web-client CAT

• Richer CAT meta-data

• Summary

Page 3: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

ITS 2.0 Draft Data

Categories

ITS1.0

• Translate

• Localization Note

• Terminology

• Directionality

• Ruby

• Lang info

• Element within text

I18n

• Locale Filter

• External Resource

• Preserve Space

• Allowed Characters

• Storage Size

• ID Value

Language Technology

• Domain

• MT confidence

• Text Analysis

Provenance & QA

• Quality Issue

• Quality Rating

• Provenance

Page 4: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Meta-Data and CAT Tools

• Meta-Data can provide useful information to

translators if presented carefully

• Translation, Post-editing and Review tasks

can add meta-data

• Integration with tool chain requires standard

meta-data specification

• ITS2.0 provide new standards for several CAT

use cases

• What further CAT meta-data can be leveraged

?

Page 5: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Meta-Data and CAT Tools

• Much ITS and ITS2.0 metadata is already

implicitly supported in OmegaT and other CAT

tools.

Some examples from OmegaT…

Page 6: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Localization note in HTML (ITS)

Page 7: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Localization note in HTML (OmegaT)

Page 8: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

RTL and LTR mixed in a segment (ITS)

Page 9: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

RTL and LTR mixed in a segment (OmegaT)

Shift + Ctrl + O

Page 10: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Protected text (ITS)

Page 11: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Protected text (OmegaT)

Protected text spans are not included for word counts.

One of a number of features sponsored by Welocalize in OmegaT 3.0

Page 12: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

ITS2.0 Confidence Scores

Page 13: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

ITS2.0 Confidence Scores (webcat)

http://mobile-webcat.appspot.com

Pablo Porto

Page 14: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

ITS2.0 Confidence Scores (OmegaT)

Anuar Serikov

ITS2.0 extensions in OmegaT

Page 15: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

ITS2.0 Confidence Scores (OmegaT)

Page 16: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Tabular segment display option

Colours –easy to see segment status but

Inflexible in some regards,

Precedence

e.g. Mark segments with Notes has

precedence over Mark (Un)Translated

Segments.

Sooner or later you run out of easily

distinguishable colours.

Graphics contain more information.

Page 17: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Walking before running…OmegaT current

Page 18: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Walking before running…OmegaT dev

Anuar

Page 19: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Walking before running…OmegaT dev

Anuar

Page 20: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Planned…

Page 21: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

An idea… target terminology

Page 22: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

An idea… target terminology

Page 23: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

its:allowedCharactersRule

Tabular display should make it easier to show infringements of…

its:storageSizeRule

But other options are available. E.g. Validate Tags under Tools menu,

regular expressions, scripts plugin.

Page 24: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Instrumentation in iOmegaT based on TransLog but in

a CAT tool

Via ITS2.0 provRef attribute to implement reference to external provenance

descriptions

Page 25: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Instrumentation

Similar to logging but we wanted to distinguish

it from application logging (e.g. for debugging)

If you have a technology that purports to improve translator speed it

helps to be able to measure that in the field

E.g. Machine Translation, predictive typing

Page 26: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Instrumentation

Page 27: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Instrumentation

Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)

Quality checks for all languages

Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)

Quality checks for all languages

Example : Dell MT versus HT (Human Translation)

carried out in Welocalize

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German,

Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in

total)

Quality checks for all languages

Page 28: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

HT Words/Hr

MT Words/Hr

MT versus Human Translation (base case)

17%+

Page 29: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

MT versus MT

Goal, MT impact on translation speed on ongoing basis

More information at try-and-see-mt.org

Page 30: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

OmegaT

Page 31: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

OmegaT (recent/upcoming developments)

Team translation using SVN / Git (incl. notes feature and lemmatized glossary)

OmegaT support in GlobalSight

LSP adoption (e.g. Velior)

SDLXLIFF support

Page 32: Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo

www.cngl.ie

Summary

• Translation/Post-editing provenance

and Instrumentation becoming more

important downstream

• Open source gaining industry traction

• ITS goes mainly from content to

translator. Can provenance and NLP,

help facilitate terminology creation?