Top Banner
Copyright 2011 by Sanda Harabagiu 1 Natural Language Processing CS 6320 Lecture 14 Machine Translation Instructor: Sanda Harabagiu
42

Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

Mar 11, 2018

Download

Documents

lythu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

Copyright 2011 by Sanda Harabagiu 1

Natural Language Processing

CS 6320

Lecture 14Machine Translation

Instructor: Sanda Harabagiu

Page 2: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

2

Machine Translation

• The idea: techniques for using computers to automatesome or all of the process of translating from one language to the other.

• Example: translate a passage from a novel in Mandarin (the end of Chapter 45 of the 18-th century novel The story of the stone also called Deam of the Red Chamber.by Cao Xue Qin from 1792).

Dai yu zi zai chuang shang gan nian bao chai … you ting jian chuang

wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue

you di xia lei lai.

Page 3: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

3

Machine Translation

Alignment lines are drawn between Chinese words and the corresponding English glosses. Words in white boxes only appear in one of the languages.

Page 4: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

4

Observations1. Four English sentences correspond to one long Chinese sentence.

2. The word order is very different (many crossed alignments)

3. English has many mire words than Chinese.

What causes the differences???? Structural differences between the languages.

Page 5: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

5

Structural Differences1. Chinese rarely marks verbal aspect or tense

• English has additional word, e.g. “turned to”, “had begun”.

2. Chinese has less articles than English.

• In English there are many “the” added.

3. Chinese uses less pronouns than English.

• In English, the translation had to add “her” and “she”.

Page 6: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

Copyright 2006 by Sanda Harabagiu 6

Other Differences

1. Stylistic differences

• Unlike English names, Chinese names are made up of regular content words with meaning. To translate the Chinese names transliterations can be used, e.g. “Daiyu” or their corresponding meaning, e.g. “Aroma”, “Skybright”

2. Cultural differences.

• Chinese bed-curtains – translated into English as “curtains of her bed”

• The Chinese phase “bamboo tip plantain leaf” is translated into English “bamboos and plantains”

• Translation requires a deep and rich understanding of the source language and a sophisticated, poetic, and creative command of the target language.

Page 7: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

Copyright 2006 by Sanda Harabagiu 7

Is Chinese more Difficult than other Languages?

• English: Following a two-year transitional period, the new Foodstuffs Ordinance for Mineral Water came into effect on April 1, 1988. Specifically, it contains more stringent requirements regarding quality consistency and purity guarantees.

• French: La nouvelle ordonnance fèdèrale sur les denrées alimentaires concernant entre autres les eaux minérales, entrée en vigueur le ler avril 1988 aprés une période transitoire de deux ans. exige surtout une plus grande constance dans la qualité et une garantie de la pureté.

• French gloss: THE NEW ORDINANCE FEDERAL ON THE STUFF FOOD CONCERNING AMONG OTHERS THE WATERS MINERAL CAME INTO EFFECT THE 1ST APRIL 1988 AFTER A PERIOD TRANSITORY OF TWO YEARS REQUIRES ABOVE ALL A LARGER CONSISTENCY IN THE QUALITY AND A GUARANTEE OF THE PURITY.

• The translation still has to deal with differences in word order (e.g. the location of the following a two year transitional period phrase), and in structure (e.g. English uses the noun requirements while the French uses the verb exige ‘REQUIRE’)

Page 8: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

8

Models of Machine Translation

• Computational models of machine translation include:

1) Tasks for which a rough translation is adequate;

2) Tasks where a human post-editor is used;

3) Tasks limited to small sublanguage domains in which fully automatic high quality translation (FAHQT) is still achievable.

Page 9: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

9

Is Rough Translation Useful ?

• Example: How to cook platanos ?

Page 10: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

10

Is Post-Editing Useful ?

• This tasks can be used to speed-up the human translation process;

• They produce a draft translation that is fixed up in a post-editing phase by a human translator;

• Systems used in this way are doing computer-aided human translation (CAHT or CAT) rather than (fully automatic) machine translation.

• This model of MT is effective especially for high volume jobs and those requiring quick turn-around, such as the translation of software manuals for localization to reach new markets.

Page 11: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

11

Is Fully Automatic Translation on Specific Domains Useful ?

• Example of sublanguage domain: weather forecasting

• Weather forecasts consist of phases like:

• Cloudy with a chance of showers today and Thursday

• Outlook for Friday: Sunny

• Characteristics of sublanguage domains:

• limited vocabulary and only a few basic phrase types

• ambiguity is rare. The senses of ambiguous words are easily disambiguated based on local context, using word classes and semantic features such as WEEKDAY, PLACE, or TIME POINT.

• Other sublanguage domains:

• equipment maintenance manuals

• air travel queries

• appointment scheduling

• restaurant recommendations

Page 12: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

12

Approaches for Machine Translation

• We will discuss the following methods for machine translation:

• Direct

• Transfer

• Interlingua

• Statistical MT

Classic models for doing MT

Page 13: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

13

Why is Machine Translation So Hard ?

• Characteristics of language similarities or differences include:

• Systematic differences

• Idiosyncratic and lexical differences;

• Even when languages differ, these differences often have systematic structure;

• The study of systematic cross-linguistic similarities and differences is called typology.

Page 14: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

14

Morphologic Differences

• Morphologically, languages are characterized along two dimensions of variations:

• Number of morphemes peer word ranging from isolating languages (Vietnamese or Cantonese) to polysynthetic languages (Siberian Yupic “Eskimo”)

• In isolating languages each word generally has one morpheme

• In polysynthetic languages a single word may have multiple morphemes

• The degree to which morphemes are segmentable ranging from agglutinative languages (Turkish) to fusion languages(Russian)

• In agglutinative languages morphemes have relatively clean boundaries

• In fusion languages a single affix may conflate multiple morphemes.

Page 15: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

15

Syntactic Differences

• Languages are most saliently different in the basic word order of verbs, subjects, and objects in simple declarative clauses:

• German, French, English and Mandarin are SVO (Subject-Object-Verb) languages

• Hindi and Japanese are SOV languages

• Irish, Arabic and Biblical Hebrew are VSO languages

• Two languages that share their basic word-order type often have other similarities. For example, SVO languages generally have prepositions while SOV languages generally have postpositions.

Example:

Page 16: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

16

Other Topological Variations

• Topological variations related to argument structure and linking of predicates with their arguments.

• Difference between head-marking and dependent-marking languages.

• Head-marking languages tend to mark the relation between the head and its dependents on the head.

• Dependent-marking languages tend to mark the relation on the non-head

• Hungarian, for example, marks the possessive relation with an affix (A) on the head noun (H), where English marks it on the (non-

head) possessor:

Page 17: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

17

Other Topological Variations

• Typological variation in linking can also relate to how the conceptual properties of an event are mapped onto specific words.

• Languages can be characterized by whether direction of motion and manner of motion are marked on the verb or on the “satellites”: particles, prepositional phrases, or adverbial phrases.

• For example a bottle floating out of a cave would be described in English with the direction marked on the particle out, while in Spanish the direction would be marked on the verb:

Page 18: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

18

Verb-Framed and Satellite-Framed Languages

• Verb-framed languages mark the direction of motion on the verb (leaving the satellites to mark the manner of motion), like Spanish acercarse ‘approach’, alcanzar

‘reach’, entrar ‘enter’, salir ‘exit’

• Satellite-framed languages mark the direction of motion on the satellite (leaving the verb to mark the manner of motion), like English crawl out, float off, jump down, walk over to, run after.

• Verb-framed languages: Japanese, Tamil, and the many languages in the Romance, Semitic, and Mayan languages families.

• Satellite-framed languages: Chinese as well as non-Romance Indo-European languages like English, Swedish, Russian, Hindi, and Farsi,

Page 19: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

19

Omitting Things in Language

• Languages vary along a typological dimension related to the things they can omit.

• In some languages we can omit pronouns.

• Example from Spanish, using Ø – notation:

• Languages which can omit pronouns in these ways are called pro-drop languages.

• Referentially sparse languages, like Chinese or Japanese, that require the hearer to do more inferential work to recover antecedents are called cold languages.

• Languages that are more explicit and make it easier for the hearer are called hot languages

Page 20: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

20

Other Structural Divergences

• Many structural divergences between languages are based on typological differences.

• Others are simply idiosyncratic differences that are characteristic of particular languages or language pairs.

• For example in English the unmarked order in a noun-phrase has adjectives precede nouns, but in French and Spanish adjectives generally follow nouns:

• Chinese relative clauses are structured very differently than English relative clauses, making translation of long Chinese sentences very complex.

Page 21: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

21

Lexical Divergences

• Homonymous words, in order to be translated correctly, require solving the exact same problems as word sense disambiguation

• For example, the English source language word bass

could appear in Spanish as the fish lubina or the instrument bajo.

• Even in cases of polysemy, however, we often have to disambiguate if the target language doesn’t have the exact same kind of polysemy.

Page 22: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

22

More Examples of Lexical Divergences

• German, for example, uses two distinct words for what in English would be called a wall:

• Wand for walls inside a building, and

• Mauer, for walls outside a building.

• English uses the word brother for any male sibling. Both Japanese and Chinese, have distinct words for older brother and younger brother (Chinese gege and didi, respectively).

• Lexical divergences can be grammatical - a word may translate best to a different part-of-speech in the target language: Enlish verb like must be translated into German using the adverbial gern;

she likes to sing maps to

sie singt gerne (SHE SINGS LIKINGLY).

Page 23: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

23

Lexical Divergences

• The way that languages differ in lexically dividing up conceptual space may be more complex than this one-to-many translation problem, leading to many-to many mappings. Figure 24.2 summarizes some of the complexities discussed by Hutchins and Somers (1992) in relating English leg, foot, and paw, to the French jambe, pied, patte

Page 24: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

24

Classical MT Approaches

• Direct Approach• we proceed word-by-word through the source language text,

translating each word as we go

• uses a large bilingual dictionary, each of whose entries is a small program with the job of translating one word

• Transfer Approach• we first parse the input text, and then apply rules to transform

the source language parse structure into a target language parse structure.

• we then generate the target language sentence from the parse structure.

• Interlingua Approach• we analyze the source language text into some abstract

meaning representation, called an interlingua.

• we then generate into the target language from this interlingual representation.

Page 25: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

25

Vauquois Triangle

• Vauquois triangle - a common way to visualize classical MT approaches:

• it shows the increasing depth of analysis required as we move from the direct approach through transfer approaches, to interlingual approaches.

• it shows the decreasing amount of transfer knowledge needed as we move up the triangle, from huge amounts of transfer at the direct level through transfer through interlingua

Page 26: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

26

Direct Translation

Characteristics:

• we proceed word-by-word through the source language text, translating each word as we go.

• we make use of no intermediate structures, except for shallow morphological analysis; each source word is directly mapped onto some target word.

• the approach is thus based on a large bilingual dictionary; each entry in the dictionary can be viewed as a small program whose job is to translate one word

• after the words are translated, simple reordering rulescan apply, for example for moving adjectives after nouns when translating from English to French.

Page 27: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

27

Direct Machine Translation

• The guiding intuition of the direct approach is that we translate by incrementally transforming the source language text into a target language text.

• Example, translating from English to Spanish:

Page 28: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

28

Translation Example Using Direct Machine Approach

Page 29: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

29

An Early Direct English-Russian System

Page 30: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

30

Problems with Direct Translation

• It has no parsing component or any knowledge about phrasing or grammatical structure in the source or target language.

• Therefore, it cannot handle longer-distance reorderings, or those involving phrases or larger structures.

Page 31: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

31

More Complex Reorderings

• The Direct Approach is too focused on individual words.

• In order to deal with real examples we’ll need to add phrasal and structural knowledge into our MT models

• Chinese – English Translation

• English – Japanese Translation

Page 32: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

32

Transfer Model

• To overcome the differences in language structures, one strategy for doing MT is by altering the structure of the input to make it conform to the rules of the target language.

• This can be done by applying contrastive knowledge, that is, knowledge about the differences between the two languages.

• Systems that use this strategy are said to be based on the transfer model.

• In transfer model, MT involves three phases:

1) analysis

2) transfer – bridges the gap between the output of the source language parser and the input to the target language generator.

3) generation.

Page 33: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

33

Transformation Rules – Syntactic Transfer

• Once we have parsed the source language, we’ll need rules for syntactic transfer and lexical transfer.

• The syntactic transfer rules will tell us how to modify the source parse tree to resemble the target parse tree.

• These syntactic transformations are operations that map from one tree structure to another.

Page 34: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

34

Syntactic Transformation Example

1 2

3

Page 35: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

35

Transfer Example from English to Japanese

Page 36: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

36

Transfer Using Semantic Roles (Semantic Transfer)

Page 37: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

37

Transformation Rules – Lexical Transfer

• Lexical Transfer is generally based on a bilingual dictionary, just as for direct MT

• The dictionary itself can also be used to deal with problems of lexical ambiguity. For example the English word home has many possible translations in German, including:

• nach Hause (in the sense of going home)

• Heim (in the sense of a home game),

• Heimat (in the sense of homeland, home country, or spiritual home),

• zu Hause (in the sense of being at home)

Page 38: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

38

Combining Direct and Transfer Approaches in Classic MT

• In practice, we need messy rules which combine rich lexical knowledge of both languages with syntactic and semantic features

• Commercial MT systems tend to be combinations of the direct and transfer approaches, using rich bilingual dictionaries, but also using taggers and parsers. The Systran system has three componens:

• First is a shallow parsing stage, including:

• part of speech tagging

• chunking of NPs, PPs, and larger phrases

• shallow dependency parsing (subjects, passives, head-modifiers)

• Next is a transfer phase, including:

• translation of idioms,

• word sense disambiguation

• assigning prepositions based on governing verbs

• Finally, in the synthesis stage, the system:

• applies a rich bilingual dictionary to do lexical translation

• deals with reorderings

• performs morphological generation

Page 39: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

39

The Intrerlingua Idea: Using Meaning• One problem with the transfer model is that it requires a distinct set of transfer rules for each pair of languages.

• This is clearly suboptimal for translation systems employed in many-to-many multilingual environments like the European Union

• The interlingua intuition: treat translation as a process of extracting the meaning of the input and then expressing that meaning in the target language.

Page 40: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

40

The Intrerlingua Idea: Using Meaning

• The intuition presupposes the existence of a meaning representation, or interlingua, in a language-independent canonical form

• The idea is for the interlingua to represent all sentences

that mean the “same” thing in the same way, regardless of the language they happen to be in.

• Translation in this model proceeds by performing a deep semantic analysis on the input from language X into the interlingual representation and generating from the interlingua to language Y.

Page 41: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

41

What Kind of Representation Scheme can We Use as an Interlingua?

• Example of representation schemes:

• Predicate calculus

• Minimal Recursion semantics

• Semantic decomposition into some kind of atomic semantic primitives

• Event-based representation, in which events are linked to their arguments via a small fixed set of thematic roles using the semantic analyzer techniques

• The interlingua thus requires more analysis work than the transfer model, which only required syntactic parsing.

• But generation can now proceed directly from the interlingua with no need for syntactic transformations.

Page 42: Natural Language Processing CS 6320 Lecture 14 Machine ...sanda/courses/NLP/Lecture14.pdf · Chinese uses less pronouns than English. ... than English relative clauses, ... nouns

42

Problems with Interlingua Model

• The interlingual model has its own problems:

• For example, in order to translate from Japanese to Chinese the universal interlingua must include concepts such as ELDER-BROTHER and YOUNGER-BROTHER

• same concepts translating from German-to-English would then require large amounts of unnecessary disambiguation

• the interlingua commitment requires exhaustive analysis of the semantics of the domain and formalization into an ontology

• Generally this is only possible in relatively simple domains based on a database model, as in the air travel, hotel reservation, or restaurant recommendation domains, where the database definition determines the possible entities and relations.

• For these reasons, interlingual systems are generally only used in sublanguage domains.