Page 1
Identification of Fertile Translations in ComparableCorpora
a Morpho-Compositional Approach
Estelle Delpech1, Beatrice Daille1, Emmanuel Morin1, ClaireLemaire2,3
1LINA, Universite de Nantes 2GREMUTS, Universite de Grenoble3Lingua et Machina
AMTA’12 10/31/12 San Diego, CA
Page 2
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 3
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 4
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Context
Research partly funded by Computer-Aided Translationcompany
Goal: generate domain-specific bilingual lexicons when noparallel data is available
Available data:I general language bilingual dictionaryI domain-specific comparable corpora
1 / 28
Page 5
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Context
Research partly funded by Computer-Aided Translationcompany
Goal: generate domain-specific bilingual lexicons when noparallel data is available
Available data:I general language bilingual dictionaryI domain-specific comparable corpora
1 / 28
Page 6
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Context
Research partly funded by Computer-Aided Translationcompany
Goal: generate domain-specific bilingual lexicons when noparallel data is available
Available data:I general language bilingual dictionaryI domain-specific comparable corpora
1 / 28
Page 7
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Context
Research partly funded by Computer-Aided Translationcompany
Goal: generate domain-specific bilingual lexicons when noparallel data is available
Available data:I general language bilingual dictionaryI domain-specific comparable corpora
1 / 28
Page 8
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:
I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 9
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:
I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 10
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:I language in target texts is not influenced by source texts
I mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 11
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 12
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 13
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs
Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
Page 14
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’
⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 15
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’
⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 16
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 17
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 18
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 19
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
Page 20
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Variation in translation
Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons
Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance⇒ use of synonyms, thesaurus
Fertility:I bi-dimensional → deux dimensions ’two dimensions’⇒ scarcely adressed
3 / 28
Page 21
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color
surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’
4 / 28
Page 22
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color
surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’
4 / 28
Page 23
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color
surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’
4 / 28
Page 24
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Original problemComparable corporaVariation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color
surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’
4 / 28
Page 25
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 26
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation
Principle of compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Definition of compositional translation
The translation of the whole is a function of the translation of theparts
5 / 28
Page 27
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation
Principle of compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Definition of compositional translation
The translation of the whole is a function of the translation of theparts
5 / 28
Page 28
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation process
1 DecompositionI “cytotoxic” → {cyto, toxic}
2 TranslationI {cyto, toxic} → {cyto, toxique}
3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}
4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
Page 29
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation process
1 DecompositionI “cytotoxic” → {cyto, toxic}
2 TranslationI {cyto, toxic} → {cyto, toxique}
3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}
4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
Page 30
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation process
1 DecompositionI “cytotoxic” → {cyto, toxic}
2 TranslationI {cyto, toxic} → {cyto, toxique}
3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}
4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
Page 31
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation process
1 DecompositionI “cytotoxic” → {cyto, toxic}
2 TranslationI {cyto, toxic} → {cyto, toxique}
3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}
4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
Page 32
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Compositional translation process
1 DecompositionI “cytotoxic” → {cyto, toxic}
2 TranslationI {cyto, toxic} → {cyto, toxique}
3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}
4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
Page 33
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Relevance of compositional translation
More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]
Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]
7 / 28
Page 34
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Relevance of compositional translation
More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]
Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]
7 / 28
Page 35
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Relevance of compositional translation
More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]
Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]
7 / 28
Page 36
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
I ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
I Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 37
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
I ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
I Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 38
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
I Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 39
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
I Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 40
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
I Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 41
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 42
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 43
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phrase
I Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 44
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phraseI Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 45
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phraseI Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 46
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’
[Harastani et al., 2012]
Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’
[Weller et al., 2011]
Noun compound → noun phraseI Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Page 47
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution I
Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:
I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’
9 / 28
Page 48
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution I
Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:
I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’
9 / 28
Page 49
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution I
Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:
I cyto → cellule ’cell’
I cytotoxic → toxique (pour les) cellules ’toxic to the cells’
9 / 28
Page 50
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution I
Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:
I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’
9 / 28
Page 51
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution II
Larger variety of input/output structures:
SOURCE TARGET
prefixed wordneoclassical compoundsuffixed wordcompoundany combination
=⇒
prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase
10 / 28
Page 52
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution II
Larger variety of input/output structures:
SOURCE TARGET
prefixed wordneoclassical compoundsuffixed wordcompoundany combination
=⇒
prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase
10 / 28
Page 53
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution II
Larger variety of input/output structures:
SOURCE TARGET
prefixed wordneoclassical compoundsuffixed wordcompoundany combination
=⇒
prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase
10 / 28
Page 54
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Underlying principle and advantagesRelated workContribution
Contribution II
Larger variety of input/output structures:
SOURCE TARGET
prefixed wordneoclassical compoundsuffixed wordcompoundany combination
=⇒
prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase
10 / 28
Page 55
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 56
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Overview
1 DecompositionI lexicons + heuristic rules
2 TranslationI dictionary look-up
3 RecompositionI permutations
4 SelectionI search occurrences in target texts
11 / 28
Page 57
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Overview
1 DecompositionI lexicons + heuristic rules
2 TranslationI dictionary look-up
3 RecompositionI permutations
4 SelectionI search occurrences in target texts
11 / 28
Page 58
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Overview
1 DecompositionI lexicons + heuristic rules
2 TranslationI dictionary look-up
3 RecompositionI permutations
4 SelectionI search occurrences in target texts
11 / 28
Page 59
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Overview
1 DecompositionI lexicons + heuristic rules
2 TranslationI dictionary look-up
3 RecompositionI permutations
4 SelectionI search occurrences in target texts
11 / 28
Page 60
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Overview
1 DecompositionI lexicons + heuristic rules
2 TranslationI dictionary look-up
3 RecompositionI permutations
4 SelectionI search occurrences in target texts
11 / 28
Page 61
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 62
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 63
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 64
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 65
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 66
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
Page 67
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 2
Generate all possible concatenations of the minimalcomponents:
{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}
⇒ Increases the chances of matching the components withentries of the dictionaries
13 / 28
Page 68
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 2
Generate all possible concatenations of the minimalcomponents:
{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}
⇒ Increases the chances of matching the components withentries of the dictionaries
13 / 28
Page 69
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 2
Generate all possible concatenations of the minimalcomponents:
{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}
⇒ Increases the chances of matching the components withentries of the dictionaries
13 / 28
Page 70
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Decomposition - step 2
Generate all possible concatenations of the minimalcomponents:
{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}
⇒ Increases the chances of matching the components withentries of the dictionaries
13 / 28
Page 71
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
14 / 28
Page 72
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
14 / 28
Page 73
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
14 / 28
Page 74
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
14 / 28
Page 75
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation with variation
Morphological lexiconI toxique → toxicite ’toxicity’
SynonymsI toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
15 / 28
Page 76
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation with variation
Morphological lexiconI toxique → toxicite ’toxicity’
SynonymsI toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
15 / 28
Page 77
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation with variation
Morphological lexiconI toxique → toxicite ’toxicity’
SynonymsI toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
15 / 28
Page 78
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Translation with variation
Morphological lexiconI toxique → toxicite ’toxicity’
SynonymsI toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
15 / 28
Page 79
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}
Recreate target words by generating all possibleconcatenations of the components :
{-cyto-, toxique} → {cyto toxique},{cytotoxique}
16 / 28
Page 80
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}
Recreate target words by generating all possibleconcatenations of the components :
{-cyto-, toxique} → {cyto toxique},{cytotoxique}
16 / 28
Page 81
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}
Recreate target words by generating all possibleconcatenations of the components :
{-cyto-, toxique} → {cyto toxique},{cytotoxique}
16 / 28
Page 82
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}
Recreate target words by generating all possibleconcatenations of the components :
{-cyto-, toxique} → {cyto toxique},{cytotoxique}
16 / 28
Page 83
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}
Recreate target words by generating all possibleconcatenations of the components :
{-cyto-, toxique} → {cyto toxique},{cytotoxique}
16 / 28
Page 84
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 2
Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item
{cyto toxique}, {cytotoxique}→ {cytotoxique}
17 / 28
Page 85
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 2
Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item
{cyto toxique}, {cytotoxique}→ {cytotoxique}
17 / 28
Page 86
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Recomposition - step 2
Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item
{cyto toxique}, {cytotoxique}→ {cytotoxique}
17 / 28
Page 87
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
18 / 28
Page 88
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
18 / 28
Page 89
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
18 / 28
Page 90
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DecompositionTranslationRecompositionSelection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
18 / 28
Page 91
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 92
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com
19 / 28
Page 93
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com
19 / 28
Page 94
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com
19 / 28
Page 95
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language
12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com
19 / 28
Page 96
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com
19 / 28
Page 97
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com19 / 28
Page 98
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com19 / 28
Page 99
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71
I English-German: 0.45
1http://www.temis.com19 / 28
Page 100
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Corpora
English, French, German
breast cancer
approx. 400k words per language12 scientic papers + 1
2 lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable
I English-French: 0.71I English-German: 0.45
1http://www.temis.com19 / 28
Page 101
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Source terms
Morphologically constructed word collected from the Englishtexts
None of them have a translation in the general languagedictionary which is attested in the target texts
I English to French: 1839 source termsI English to German: 1824 source terms
20 / 28
Page 102
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Source terms
Morphologically constructed word collected from the Englishtexts
None of them have a translation in the general languagedictionary which is attested in the target texts
I English to French: 1839 source termsI English to German: 1824 source terms
20 / 28
Page 103
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Source terms
Morphologically constructed word collected from the Englishtexts
None of them have a translation in the general languagedictionary which is attested in the target texts
I English to French: 1839 source termsI English to German: 1824 source terms
20 / 28
Page 104
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 105
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 106
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 107
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 108
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 109
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
Page 110
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Evaluation measures I
Coverage
C =
∑|ST |i=1 σ(STi )
|ST |
σ(STi ) =
{1 if |Trans(STi )| ≥ 10 else
⇒ % of source terms with at least 1 translation (regardless of itsaccuracy)
22 / 28
Page 111
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Evaluation measures II
Precision
P =|Exact||Trans|
⇒ % of generated translations which are exact translations
Overall quality
OQ = C × P
⇒ trade-off between precision and coverage
23 / 28
Page 112
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Experiments
combination of linguistic resources
quality of the lexicon with and without the fertile translations
24 / 28
Page 113
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Experiments
combination of linguistic resources
quality of the lexicon with and without the fertile translations
24 / 28
Page 114
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Experiments
combination of linguistic resources
quality of the lexicon with and without the fertile translations
24 / 28
Page 115
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Results: English → French
C P OQ
-f +f -f +f -f +f
Gen.+Morph. .04 .12 .81 .57 .03 .07Gen.+Morph. +S .05 .15 .69 .50 .03 .08Gen.+Morph. +M .11 .23 .20 .28 .02 .06Gen.+Morph. +D .16 .26 .70 .60 .11 .16Gen.+Morph. +SMD .24 .39 .31 .33 .07 .13
avg. gain +11 -8.6 +4.8
25 / 28
Page 116
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Results: English → German
C P OQ
-f +f -f +f -f +f
Gen.+Morph. .06 .13 .80 .35 .05 .05Gen.+Morph. +S .08 .16 .69 .31 .05 .05Gen.+Morph. +M .12 .22 .40 .23 .05 .05Gen.+Morph. +D .17 .26 .65 .39 .11 .10Gen.+Morph. +SMD .24 .36 .43 .27 .10 .10
avg. gain +9.2 -28.4 -0.2
26 / 28
Page 117
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.0.71)
Morphological types:
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
27 / 28
Page 118
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.0.71)
Morphological types:
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
27 / 28
Page 119
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.0.71)
Morphological types:
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
27 / 28
Page 120
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.0.71)
Morphological types:
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
27 / 28
Page 121
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.0.71)
Morphological types:
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
27 / 28
Page 122
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Error analysis
Problems in word reordering
I self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
I in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
28 / 28
Page 123
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Error analysis
Problems in word reordering
I self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
I in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
28 / 28
Page 124
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
I in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
28 / 28
Page 125
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
I in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
28 / 28
Page 126
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
DataEvaluation measuresResults
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
28 / 28
Page 127
Outline
1 Context and original problem
2 Compositional translation framework
3 Detailed translation method
4 Experiments and results
5 Future work
Page 128
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resources
I morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 129
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resources
I morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 130
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemming
I use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 131
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 132
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 133
Context and original problemCompositional translation framework
Detailed translation methodExperiments and results
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
Page 134
Thank you for your attention.
[email protected] @univ-nantes.fr
[email protected] @lingua-et-machina.com
Page 135
ADDITIONAL SLIDES
Page 136
Exact translations
Non fertiles:I pathophysiological → physiopathologiqueI overactive → uberaktiv
Fertiles:I cardiotoxicity → toxicite cardiaque ’cardiac toxicity’I mastectomy → ablation der brust ’ablation of the breast’
Page 137
Morphological variants
Non fertiles:I dosimetry → dosimetrique ’dosimetric’I radiosensitivity → strahlenempfindlich ’radiosensitive’
Fertiles:I milk-producing → production de lait ’production of milk’I selfexamination → selbst untersuchen ’self examine’
Page 138
Inexact but semantically related
Non fertiles:I oncogene → oncogenese ’oncogenesis’I breakthrough → durchbrechen ’break’
Fertiles:I chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’I treatable → pouvoir le traiter ’can treat it’
Page 139
Wrong translations
Non fertiles:I immunoscore → immunomarquer ’immunostain’I check-in → unkontrollieren ’uncontrolled’
Fertiles:I bloodstream → fliessen mehr blut ’more blood flow’I risk-reducing → risque de reduire ’risk of reducing’
Page 140
References I
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.In 23eme International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent TextProcessing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.D. Reidel, Dordrecht, Holland.
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plainsailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moiron, springer netherlandsedition.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.International Journal of Medical Informatics, 76(2-3):226–33.
Page 141
References II
Porter, M. F. (1980).
An algorithm for suffix stripping.Program, 14(3):130–137.
Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011).
Simple methods for dealing with term variation and term alignment.In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93,Paris, France.