Week 9: resources for globalisation

Post on 07-Jan-2016

20 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Week 9: resources for globalisation. Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate Calculus Human involvement Historical note. Spelling dictionaries. - PowerPoint PPT Presentation

Transcript

Week 9: resources for globalisation

Finish spell checkers Machine Translation (MT)

The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate

Calculus Human involvement Historical note

Spelling dictionaries Implementing spelling identification

and correction algorithm

Spelling dictionaries Implementing spelling identification and

correction algorithm STAGE 1: compare each string in document with a

list of legal strings; if no corresponding string in list mark as misspelled

STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary

STAGE 3: assign probability values to each candidate in the list

STAGE 4: select best candidate

Spelling dictionaries STAGE 3

prior probability given all the words in English, is this candidate more

likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus

likelihood Given, the possible errors, or transformation, how likely

is it that error y has operated on candidate x to produce the typo?

P(t/c), calculated using a corpus of errors, or transformations

Bayesian rule: get the product of the prior probability and the

likelihood P(c) X P(t/c)

Spelling dictionaries non-word errors Implementing spelling identification

and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:

noisy channel model Bayesian Rule

Resoucres for Globalisation:Machine translation

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy)

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →

hyponyms):

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →

hyponyms): many-to-one (hyponyms → hypernym)

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy)

bank → Ufer, Bank (German)

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →

hyponyms): brother → otooto, oniisan (Japanese) blue → синий, голубой (Russian)

many-to-one (hyponyms → hypernym)

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →

hyponyms): many-to-one (hyponyms → hypernym)

hill, mountain → Berg (German) learn, teach → leren (Dutch)

Machine translation and globalisation

Ambiguity‘I made her duck’

“The possibility of interpreting an expression in two or more distinct ways”

Collins English Dictionary

Machine translation Ambiguity

Challenge of the translation depends on the level of ambiguity that arises

This depends on the closeness of the source and target languages w.r.t. the following:

vocabulary homonyms

grammar structural ambiguity

conceptual structure specificity ambiguity lexical gaps

Machine translation

Pragmatic approach

Machine translation

Pragmatic approach aim for a rough translation, ‘gist’

translation Used for multi-lingual information

retrieval

Machine translation

Pragmatic approach aim for a rough translation, ‘gist’

translation Used for multi-lingual information

retrieval involve human translators in the

process:computer-aided translation

Machine translation

Translation models Transfer model ‘the dog bit my friend’

Hindi: kutte-ne mere dost ko-kata dog my friend bit

Machine translation

Translation models Transfer model

Alter grammatical structure of source language to make it adhere to the grammatical structure of target language

Use transformation rule Analysis process (source) Transfer process (‘bridge’) Generation process (target) Problem: each source-target pair will need it own

unique set of transformation rules

Machine translation

Translation models Inter-lingua model

Extract the meaning from the source string Give it a language independent

representation, i.e. an interlingua Translation process takes the interlingua as

its input Multiple translation processes take the same

input for multiple target language outputs

Machine translation

Translation models What is the inter-lingua?

for words, some sort of semantic analysis,

e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT)Russian: идти ехать

English: go go

Machine translation and globalisation

Translation models What is the inter-lingua?

for sentences, a logical languagee.g. First Order Predicate Calculus

Meaning representation  Goal:

1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data

Meaning representation  First Order Predicate Calculus

computationally tractable objects (terms) properties of objects relations amongst objects

Predicate argument structure large composite representations

logical connectives

Meaning representation  First Order Predicate Calculus

Object: referred to uniquely by a term constant e.g. SurreyUniversity function e.g. LocationOf(SurreyUniversity) variable

Meaning representation  First Order Predicate Calculus

Relations amongst objects Predicates:

“symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M)

Educates(SurreyUniversity, Citizens) two-place predicate

Meaning representation  First Order Predicate Calculus

Relations amongst objects Predicates: Can specify the category of an object

University(SurreyUniversity) one-place predicate

Meaning representation  First Order Predicate Calculus

properties / parts of objects functions:

LocationOf(SurreyUniversity)

Meaning representation  First Order Predicate Calculus

Composite representations through predicates and functions:Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))

Meaning representation  First Order Predicate Calculus

Logical connectives combine basic representations to form

larger more complex representationse.g ٨ operator = ‘and’

Meaning representation  First Order Predicate Calculus

Logical connectives combine basic representations to form larger

more complex representationsEducates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)

Machine translation and globalisation

  Machine translation and globalisation: change of

priorities 1954: IBM and Georgetown University, first MT demo

goal: ‘perfect’ translation 1967: Automatic Language Process Advisory Committee

(ALPAC) report: damning of goal Post ALPAC

Goal: rough translation, involve human element Current situation: online translation, e.g. Babel Fish,

descendant of SYSTRAN whose goal was rough translation Journal of Machine Translation

Next week

  Globalisation as an industry SDL and the SDLX-TRADOS

globalisation application

top related