Requirements for ITS2.0 support in Computer Assisted ... · Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo
Post on 15-Feb-2019
223 Views
Preview:
Transcript
www.cngl.ie
Requirements for ITS2.0 support in Computer
Assisted Translation Tools
John Moran, Christian Saam, Anuar Serikov, Pablo Porto and Dave Lewis
CNGL
Trinity College Dublin
www.cngl.ie
Overview
• Meta-Data and CAT Tools
• Use Cases: ITS2.0 and CAT tools
• Prototype: OmegaT
• Prototype: Web-client CAT
• Richer CAT meta-data
• Summary
www.cngl.ie
ITS 2.0 Draft Data
Categories
ITS1.0
• Translate
• Localization Note
• Terminology
• Directionality
• Ruby
• Lang info
• Element within text
I18n
• Locale Filter
• External Resource
• Preserve Space
• Allowed Characters
• Storage Size
• ID Value
Language Technology
• Domain
• MT confidence
• Text Analysis
Provenance & QA
• Quality Issue
• Quality Rating
• Provenance
www.cngl.ie
Meta-Data and CAT Tools
• Meta-Data can provide useful information to
translators if presented carefully
• Translation, Post-editing and Review tasks
can add meta-data
• Integration with tool chain requires standard
meta-data specification
• ITS2.0 provide new standards for several CAT
use cases
• What further CAT meta-data can be leveraged
?
www.cngl.ie
Meta-Data and CAT Tools
• Much ITS and ITS2.0 metadata is already
implicitly supported in OmegaT and other CAT
tools.
Some examples from OmegaT…
www.cngl.ie
Protected text (OmegaT)
Protected text spans are not included for word counts.
One of a number of features sponsored by Welocalize in OmegaT 3.0
www.cngl.ie
ITS2.0 Confidence Scores (webcat)
http://mobile-webcat.appspot.com
Pablo Porto
www.cngl.ie
Tabular segment display option
Colours –easy to see segment status but
Inflexible in some regards,
Precedence
e.g. Mark segments with Notes has
precedence over Mark (Un)Translated
Segments.
Sooner or later you run out of easily
distinguishable colours.
Graphics contain more information.
www.cngl.ie
its:allowedCharactersRule
Tabular display should make it easier to show infringements of…
its:storageSizeRule
But other options are available. E.g. Validate Tags under Tools menu,
regular expressions, scripts plugin.
www.cngl.ie
Instrumentation in iOmegaT based on TransLog but in
a CAT tool
Via ITS2.0 provRef attribute to implement reference to external provenance
descriptions
www.cngl.ie
Instrumentation
Similar to logging but we wanted to distinguish
it from application logging (e.g. for debugging)
If you have a technology that purports to improve translator speed it
helps to be able to measure that in the field
E.g. Machine Translation, predictive typing
www.cngl.ie
Instrumentation
Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)
Quality checks for all languages
Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)
Quality checks for all languages
Example : Dell MT versus HT (Human Translation)
carried out in Welocalize
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German,
Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in
total)
Quality checks for all languages
www.cngl.ie
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
0
200
400
600
800
1000
1200
HT Words/Hr
MT Words/Hr
MT versus Human Translation (base case)
17%+
www.cngl.ie
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
MT versus MT
Goal, MT impact on translation speed on ongoing basis
More information at try-and-see-mt.org
www.cngl.ie
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
Example : Dell MT versus HT (Human Translation)
Typical large translation project with 20 translators in 10 languages
Languages: Simplified Chinese, Chinese Taiwan, French, Italian,
German, Spanish, Czech, Russian, Portuguese, Brazilian
Portuguese (40 days in total)
Quality checks for all languages
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
0
200
400
600
800
1000
1200
ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2
HT Words/Hr
MT Words/Hr
OmegaT (recent/upcoming developments)
Team translation using SVN / Git (incl. notes feature and lemmatized glossary)
OmegaT support in GlobalSight
LSP adoption (e.g. Velior)
SDLXLIFF support
top related