Top Banner
The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany Sebastian Stüker, KIT, Germany Luisa Bentivogli, FBK, Italy Roldano Cattoni, FBK, Italy Marcello Federico, FBK-irst, Italy IWSLT, Da Nang, 3-4 December 2015 1
49

The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Aug 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

The IWSLT 2015 Evaluation Campaign

Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Sebastian Stüker, KIT, Germany Luisa Bentivogli, FBK, Italy Roldano Cattoni, FBK, Italy

Marcello Federico, FBK-irst, Italy

IWSLT, Da Nang, 3-4 December 2015

1

Page 2: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  IWSLT review Ø  TED Talks Ø  Tracks Ø  Automatic evaluation Ø  Human evaluation Ø  Future plans

Outline

Page 3: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

IWSLT Evaluation: record of participants

13#

18#

15#

23#

18#17#

19#

12#

15#

18#

21#

16#

2004# 2005# 2006# 2007# 2008# 2009# 2010# 2011# 2012# 2013# 2014# 2015#

par$cipants*

Page 4: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

IWSLT Evaluation: record of participants

12

10 10

7 6 6

4 3

2 1 1 1 1 1 1 1

Total participations of 2015 participants

Almost 70 distinct participants in 12 years

Page 5: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

TED Talks

● TED LLC is non-profit ●  Two annual events

●  Short talks

●  Variety of topics ●  Website with:

●  Videos

●  Transcripts ●  Translations

●  CC License

Page 6: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

TED Talks Translations

Nov ‘10 Nov ‘11 Nov ‘12 Nov ‘13 Nov ‘14 Nov ‘15 Talks (EN) 800 1,080 1,395 ~1,650 1,875 2,095

Languages 80 83 93 103 105 109

Translators 4,000 6,823 8,382 11,010 18,699 15,487

Translations 12,500

24,287 +94%

32,707 +34%

49,607 +52%

65,290 +32%

83,265 +28%

Page 7: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

7

0 500 1000 1500 2000 2500

Arabic

Chinese(tradi4onal)

Dutch

English

Farsi/Persian

French(France)

German

Hebrew

Italian

Polish

Portuguese(Brazilian)

Romanian

Russian

Slovenian

Spanish

Turkish

TalksavailableatTEDsite(Nov2015

Page 8: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Human task: subtitling and translating

ü  segment audio ü  transcribe and annotate

ü  split into captions

ü  translate captions

Page 9: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language modelling Ø  Limited in-domain training data Ø  Variability of topics and styles

Ø  Acoustic modelling Ø  Speaker: accent, fluency, speaking rate, style, , ... Ø  Noise: mumble, applauses, laughs, music, ...

Ø  Translation modelling Ø  Distant and under-resourced languages Ø  Morphologically rich languages

Ø  Speech Translation Ø  From spontaneous speech to polished text Ø  Detection and removal of non-speech events Ø  Subtitling and translating in real-time

Challenges in TED Task

Page 10: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language modelling Ø  Limited in-domain training data Ø  Variability of topics and styles

Ø  Acoustic modelling Ø  Speaker: accent, fluency, speaking rate, style, , ... Ø  Noise: mumble, applauses, laughs, music, ...

Ø  Translation modelling Ø  Distant and under-resourced languages Ø  Morphologically rich languages

Ø  Speech Translation Ø  From spontaneous speech to polished text Ø  Detection and removal of non-speech events Ø  Subtitling and translating a data stream in real-time

Challenges for 2011

Page 11: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language modelling Ø  Limited in-domain training data Ø  Variability of topics and styles

Ø  Acoustic modelling Ø  Speaker: accent, fluency, speaking rate, style, , ... Ø  Noise: mumble, applauses, laughs, music, ...

Ø  Translation modelling Ø  Distant and under-resourced languages Ø  Morphologically rich languages

Ø  Speech Translation Ø  From spontaneous speech to polished text Ø  Detection and removal of non-speech events Ø  Subtitling and translating a data stream in real-time

Challenges for 2012

Page 12: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language modelling Ø  Limited in-domain training data Ø  Variability of topics and styles

Ø  Acoustic modelling Ø  Speaker: accent, fluency, speaking rate, style, , ... Ø  Noise: mumble, applauses, laughs, music, ...

Ø  Translation modelling Ø  Distant and under-resourced languages Ø  Morphologically rich languages

Ø  Speech Translation Ø  From spontaneous speech to polished text Ø  Detection and removal of non-speech events Ø  Subtitling and translating a data stream in real-time

Challenges for 2013-2014

Page 13: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language modelling Ø  Limited in-domain training data Ø  Variability of topics and styles

Ø  Acoustic modelling Ø  Speaker: accent, fluency, speaking rate, style, , ... Ø  Noise: mumble, applauses, laughs, music, ...

Ø  Translation modelling Ø  Distant and under-resourced languages Ø  Morphologically rich languages

Ø  Speech Translation Ø  From spontaneous speech to polished text Ø  Detection and removal of non-speech events Ø  Subtitling and translating a data stream in real-time

Challenges for 2014-2015

Page 14: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Automatic Speech Recognition (ASR) Ø  Transcription of talks from audio to text Ø  English (TED), German (TEDx)

Ø  Spoken Language Translation (SLT) Ø  Translation of talks from audio (or ASR output) to text Ø  German English (TEDx) Ø  English Chinese, Czech, French, German, Thai, Vietnamese (TED)

Ø  Machine Translation (MT) Ø  Translation of talks from text to text Ø  German English (TEDx) Ø  English Chinese, Czech, French, German, Thai, Vietnamese (TED)

2015 Tracks

Page 15: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Specifications

Conditions ASR SLT MT

Input: Pre-segmented no no yes

Input: Cased & Punctuated no yes

Output: Cased & Punctuated no yes yes

Automatic evaluation yes yes yes(1)

Human eval (En-Fr/De) yes

Metrics ASR SLT MT

WER ✔ ✔ ✔

BLEU ✔ ✔

TER ✔ ✔

(1) Non trivial reference baselines prepared for all directions.

NEW

Page 16: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Participants

Page 17: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: ASR English (WER%)

IWSLT15 IWSLT14 IWSLT13 tst2015 tst2014 tst2014 tst2013 tst2013

MITLL-AFR 6.6 7.1 9.9 13.7 15.9 HLT-I2R 7.7 8.9 - - - KIT 9.2 9.7 11.4 14.2 14.4 NAIST 12.0 10.4 - - - MLLP 13.3 19.5 - - - IOIT 13.8 13.9 19.7 24.0 27.2

Page 18: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Progress in ASR En (best systems WER%)

0"

2"

4"

6"

8"

10"

12"

14"

16"

2011" 2012" 2013" 2014" 2015"

tst2011"

tst2012"

tst2013"

tst2014"

Page 19: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: ASR German

TEDx

Page 20: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: SLT

TEDx

Page 21: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: SLT

Page 22: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: MT

Page 23: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: MT

Page 24: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: MT

Page 25: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: MT

Page 26: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Results: MT

Page 27: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Progress in MT (best systems BLEU%)

10

15

20

25

30

35

40

45

2011 2012 2013 2014 2015

English-French

English-German

German-English

Chinese-English

Page 28: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø Following IWSLT 2013/14: Post-Editing + HTER Ø TED task as an interesting application scenario to test the utility of MT systems in a real subtitling task

Ø Additional reference translations

Ø Edits point to specific translation errors

Ø HTER correlates well with human judgments

Ø Evaluation of MT-EnDe and MT-ViEn tasks

Ø Performed on 2015 test set (tst2015)

Human Evaluation

Page 29: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Human Evaluation (HE) Set:

Ø  a subset of tst2015

Ø  ~10,000 words

Ø ~ first half of the 12 TED talks composing tst2015

Ø  EnDe: 600 segments

Ø  ViEn: 500 segments

Evaluation Dataset

Page 30: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Lesson learned from IWSLT 2013/2014:

Ø most informative and reliable HTER:

Ø not by using the targeted reference only

Ø but by exploiting all post-edits

Evaluation Setup

Page 31: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Lesson learned from IWSLT 2013/2014:

Ø most informative and reliable HTER:

Ø not by using the targeted reference only

Ø but by exploiting all post-edits

Evaluation Setup

SRC: Tôi lớn lên trong điều kiện nuôi dạy bình thường.

Targeted Reference Only

REF: I had a normal kind of upbringing . HYP: I grew up in [normal] the conditions raised normal .

TER: 87.50

All Post-Edited References

REF: I grew up in normal raising conditions . HYP: I grew up in [normal] the conditions raised normal .

TER: 38.46

Page 32: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Lesson learned from IWSLT 2013/2014:

Ø most informative and reliable HTER:

Ø not by using the targeted reference only

Ø but by exploiting all post-edits

IWSLT 2015 official evaluation:

Ø HTER calculated on multiple references (post-edits)

Ø EnDe: 5 participants => 5 post-edits

Ø ViEn: 5 participants => 5 post-edits

Evaluation Setup

Page 33: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence

Data Collection

Page 34: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence

Ø  Data preparation:

Ø 5 systems post-edited by 5 professional translators

Ø each translator must p-edit all the HE set sentences

Ø each translator must p-edit each sentence only once

Ø each MT system must be equally p-edited by all translators

Ø  MT outputs dispatched to translators both randomly and satisfying the uniform assignment constraints

Data Collection

Page 35: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence

Ø  Data preparation:

Ø 5 systems post-edited by 5 professional translators

Ø each translator must p-edit all the HE set sentences

Ø each translator must p-edit each sentence only once

Ø each MT system must be equally p-edited by all translators

Ø  MT outputs dispatched to translators both randomly and satisfying the uniform assignment constraints

Ø  MateCat post-editing interface

Data Collection

Page 36: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Collected Post-edits

Ø  5 new references for each sentence in the HE set

Collected Data

Page 37: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Collected Post-edits

Ø  5 new references for each sentence in the HE set

Ø  Post-editors characteristics:

Collected Data

PE 1 PE 2 PE 3 PE 4 PE 5

En-De PE Effort st-dv Sys TER st-dv

22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85

Vi-En PE Effort st-dv Sys TER st-dv

37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43

Page 38: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Collected Post-edits

Ø  5 new references for each sentence in the HE set

Ø  Post-editors characteristics:

Collected Data

Ø  PE effort (HTER): highly variable among post-editors

PE 1 PE 2 PE 3 PE 4 PE 5

En-De PE Effort st-dv Sys TER st-dv

22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85

Vi-En PE Effort st-dv Sys TER st-dv

37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43

Page 39: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Collected Post-edits

Ø  5 new references for each sentence in the HE set

Ø  Post-editors characteristics:

Collected Data

Ø  PE effort (HTER): highly variable among post-editors

Ø  MT outputs assigned to translators (Sys TER): very homogeneous

PE 1 PE 2 PE 3 PE 4 PE 5

En-De PE Effort st-dv Sys TER st-dv

22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85

Vi-En PE Effort st-dv Sys TER st-dv

37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43

Page 40: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - EnDe

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03

Rank corr. 1.00 0.90 0.90

Page 41: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - EnDe

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03

Rank corr. 1.00 0.90 0.90

Statistical Significance at p < 0.01 (Approximate Randomization)

Page 42: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - EnDe

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03

Rank corr. 1.00 0.90 0.90

TER/HTER reduction

Page 43: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - EnDe

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03

Rank corr. 1.00 0.90 0.90

Spearman’s Rank Coefficient

Page 44: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - ViEn

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33

Rank corr. 1.00 0.70 0.70

Page 45: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - ViEn

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27* 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33

Rank corr. 1.00 0.70 0.70

Statistical Significance at p < 0.01 (* = p < 0.05) (Approximate Randomization)

Page 46: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - ViEn

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33

Rank corr. 1.00 0.70 0.70

TER/HTER reduction

Page 47: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Evaluation Results - ViEn

System Ranking

HTER HE Set

All PErefs

HTER HE Set

tgt PEref

TER HE Set

ref

TER Test Set

ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33

Rank corr. 1.00 0.70 0.70

Spearman’s Rank Coefficient

Page 48: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Future

Ø  TED task by now very seasoned

Ø Extend to more realistic lectures

Ø Work on more challenging tasks: conversations

Ø Include more under-resourced languages on the input side

Ø Discussion on co-location with another MT/NLP conference

Ø Continue with HE based on post-editing

Ø Funding by H2020 CSA Cracker

Detailed discussion with proposals for new tasks tomorrow

Page 49: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany

Ø  Language resources Ø  TED LLC, USA (Talk data) Ø  Workshop Machine Translation (Giga and news data) Ø  DFKI, Germany (United Nations data) Ø PJAIT (Wikipedia parallel corpus) Ø Cantab Reserarch (LM and text corpus for TED) Ø Many other external data providers

Ø  Funding Ø  H2020 CSA CRACKER Ø Internal funds of eval organizers Ø  …

Credits