Top Banner
Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik
57

Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Mar 31, 2015

Download

Documents

Alice Chill
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Deep Grammarsin Hybrid Machine Translation

University of Bergen

Helge Dyvik

Page 2: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian

A 4-year project (2002 - 2006) involving groups at:•The University of Oslo•The University of Bergen•NTNU (The University of Trondheim)

Cooperation with PARC (John Maxwell) and others

Page 3: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

The LOGON systemSchematic architecture

Page 4: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

XLE: Xerox Linguistic EnvironmentA platform developed over more than 20 years

at Xerox PARC (now PARC)Developer: John Maxwell

•LFG grammar development•Parsing•Generation•Transfer•Stochastic parse selection•Interaction with shallow methods

Page 5: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

An LFG analysis:

Det regnet'It rained'

Page 6: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

•Develops parallel grammars on XLE:English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese•‘Parallel grammars’ means parallel f-structures:

A common inventory of featuresCommon principles of analysis

ParGram: The Parallel Grammar ProjectA long-term project (1993-)

Page 7: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

LOGON Analysis Modules

Input string

•Tokenization•Named ent.•Compounds•Morphology

LFG lexicons:•NKL-derived•Hand coded

Lexicaltemplates

SyntacticrulesRule templates

c-structures

f-structures

MRSs

Norsk ordbanklexicon

XLE Parser

NorGram String of stemsand tags

Output-inputSupporting knowledgebase

Page 8: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Scope of NorGram

Lexicon: about 80 000 lemmas.In addition:

Automatically analyzed compoundsAutomatically recognized proper names"Guessed" nouns

Syntax: 229 complex rules, giving rise to about 48 000 arcs

Semantics: Minimal Recursion Semantics projections for all readings

Page 9: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Coverage

Performance on an unknown corpus of newspaper text:

•17 randomly selected pieces of text, limited to coherent text,

•comprising 1000 sentences

•taken from 9 newspapers

Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende,

Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys,

•from the editions on November 11th 2005.

Page 10: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 11: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

The LOGON challenge:

From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.

Page 12: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Semantics for translation:Two issues

• The representational subset problem- Desirable: normalization to flat structures withunordered elements.

• Complete and detailed semantic analyses may be unnecessary.

- Desirable: rich possibilities of underspecification

Page 13: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Basics of

Minimal Recursion Semantics

•Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I.

Sag

•A framework for the representation of semantic information

•Developed in the context of HPSG and machine translation

(Verbmobil)

•Sources of inspiration:

- Quasi-Logical Form (H. Alshawi):

underspecification, e.g. of quantifier scope

- Shake-and-bake translation (P. Whitelock):

a bag of words as interface structure

Page 14: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

An MRS representation

• is a bag of semantic entities (some corresponding to words,

some not),

each with a handle,

• plus a bag of handle constraints allowing the underspecification

of

scope,

• plus a handle and an index.

• Each semantic entity is referred to as an Elementary Predication

(EP).

• Relations among EPs are captured by means of shared

variables.

• There are three elementary variable types:

- handles (or 'labels') (h)

- events (e)

- referential indices (x)

Page 15: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

From standard logical form to MRS

«Every ferry crosses some fjord»

Two readings:

Replace operators with generalized quantifiers:

every(variable, restriction, body)some(variable, restriction, body)

The first reading (wide-scope every):

var restriction body

Page 16: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)

Page 17: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Underspecified scope by means of handle constraints:

Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)

Wide scope: someWide scope: every

Page 18: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

MRS as feature structure (also adding event variables):

Norwegian translation: «Hver ferge krysser en fjord»

Page 19: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Projecting MRS representationsfrom f-structures

«Katten sover»'The cat sleeps'

Page 20: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Projecting MRS representationsfrom f-structures

«Katten sover»'The cat sleeps'

Page 21: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 22: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

mrs::

Page 23: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

mrs::

mrs::

Page 24: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Composition: Top-level MRSwith unions of HCONS and RELS:

Page 25: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 26: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Post-processing this structurebrings us back to the LOGON MRS format:

http://decentius.aksis.uib.no/logon/xle-mrs.xml

Page 28: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

bil 'car' (as in "Han kjøpte bil" 'He bought [a] car')

No SPEC

Page 29: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

disse hans mange spørsmål 'these his many questions'

Multiple SPECs

Page 30: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Han jaget barnet ut nakent'He chased the child out naked'

Page 31: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

The Transfer Component

Developer of the formalism: Stephan Oepen

Page 32: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Example of transfer

Source sentence:

Henter han bilen sin?fetches he car.DEF POSS.REFL.SG.MASC'Does he fetch his car?'

Alternative reading:'Does he fetch the one of the car?'

Page 33: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Parse output:

Page 34: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Choosing the first reading of Henter han bilen sin?

Page 35: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Choosing the first reading of Henter han bilen sin?

The variables have features.Interrogative is coded as [SF ques] on the event variable.

Page 36: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Two of fourtransferoutputs

Page 37: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Norwegiantransferinput

One of fourEnglishtransferoutputs

Page 38: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Generator output from the chosen transfer output

Page 39: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Transfer formalism(Stephan Oepen)

The form of a transfer rule:

C = contextI = inputF = filterO = output

Page 40: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Simple example:Lexical transfer rule, transferring bekk into creek

No context, no filter, only the predicate is replaced.

Page 41: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Example with a context restriction:gå en tur (lit. 'go a trip') is transferred into the light-verb constructiontake a trip.

In the context of _tur_n as its second argument,_gå_v is transferred to _take_v.

Page 42: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

The SEM-I(Semantic Interface)

A documentation of the external semantic interfacefor a grammar, crucial for the writer of transfer rules.

In order to enforce the maintaining of a SEM-I,LOGON parsing returns fail if every parse containsat least one predicate not in the SEM-I.

Page 43: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

A small sectionof the verb partof the NorGramSEM-ISize of the NorwegianSEM-I: slightly lessthan 6000 entries

Page 44: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Parse Selection

Parsing, transfer and generation may each givemany solutions, leading to a fanout tree:

The outputs at each of the three stages arestatistically ranked.

Page 45: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Example of a four-way ambiguity:

Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain'

The ParsebankerEfficient treebank building by discriminants

Developer: Paul Meurer, Bergen

Predecessors in discriminant analysis:David Carter (1997)Stephan Oepen, Dan Flickinger & al. (2003)

Page 46: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

1

2

Page 47: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

3

4

Page 48: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Packed representations and discriminants(Paul Meurer)

Page 49: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 50: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Clicking on one discriminant is in this case sufficientto select a unique solution:

Page 52: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 53: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.
Page 54: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

'After all, a human being must be something more than a machine?'

Page 55: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

TigerSearchThe implementation is under development by Paul Meurer

Find selected prepositional phrases with sentential objects:

Page 56: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Find selected prepositional phrases with the preposition 'om' and nominal objects:

Page 57: Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Find topicalized objects: