Top Banner
An Intuitive An Intuitive Representation of Human Representation of Human Languages Languages for Translation for Translation Gábor Gábor Prószéky Prószéky MorphoLogic MorphoLogic & & Faculty of Faculty of Information Information Technology, Technology, Pázmány University Pázmány University Kalmár Workshop Kalmár Workshop Szeged, October 1-2, 2003 Szeged, October 1-2, 2003
28

An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Mar 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

An Intuitive An Intuitive Representation of Human Representation of Human

Languages Languages for Translationfor Translation

Gábor PrószékyGábor PrószékyMorphoLogicMorphoLogic

&&

Faculty of Information Faculty of Information Technology,Technology,

Pázmány UniversityPázmány University

Kalmár WorkshopKalmár WorkshopSzeged, October 1-2, 2003Szeged, October 1-2, 2003

Page 2: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Contents

Some words on Prof. Kalmár’s activity in computational linguistics

Problems of human language description with formal tools

A new representation with patterns Introduction to machine translation

methods Application of patterns to

translation

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 3: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Kalmár & languages

Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages”

Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures”

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 4: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Linguistic representation problems of the 60’s

Dependency structure Constituent structure X-bar theory:

X’ (P) X (Q) Related structures Using transformations

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 5: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Structured symbols

Linguistic categories: atomic symbols

Not enough: subcategorization Semantic features: ± alive, ... Syntactic features: ± countable,

... Rule sets instead of rules ID/LP

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 6: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Feature structures

DAGs Unification problems Feature geometry, typed

features LFG, GPSG, HPSG Parsing: CF-skeleton +

features or feature structures only?

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 7: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Complexity of NL grammars

RG/FSA: not enough CF/RTN: not enough CS ? 0/ATN: Turing Machine Transformations and

metarules Arguments for and against

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 8: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

NL grammar formalisms Competence and performance? Kornai number (left-recursion, center-

embedding, “respectively” construction) Gradually from unrestricted to regular (i) anbn ->a*b* (n is lost!) (ii) anbn ->{ε,ab,aabb,aaabbb} “Finitization” by length No structure in FSA; finite systems,

however, can produce structural output

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 9: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Syntax and semantics

Logical representations(e.g. λx.dog(x), λx.run(x))

World-knowledge representations(e.g. IS-A, PART-OF, INSTANCE-OF)

Categorial grammar: early logical representations of syntax (Kalmár)

DCG: interpretation & representation

Rule-to-rule hypothesis

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 10: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Conflict handling

Lexicon meets syntax: who is right?

Lexicon: off-line info coming from past experiences

Which is more important in a specific situation?

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 11: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Open classes

Open vs. closed classes:that is, features can or cannot be overridden

Proper names, jabbers, folk etymology, loanwords, ...

Grammar of closed classes:minimal grammar

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 12: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Finite morphology Finite patterns Finite number of entries Descriptions assigned to

entries Finite & open vs.

infinite & closed Underspecified entries for

guessing

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 13: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Finite syntax

“Item and arrangement” (as in morphology)

“Arrangement” describes a rather free constituent-order

Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’ ‘DAN’

Cascades without loop

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 14: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

The „plastic box”

John is a boy. ”John” is a noun. Go is a verb. ”Go” is a verb. is a sign. ”” is a sign. is a . � �

(where is a ”plastic box”)�

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 15: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Real examples

(a) Unusual use:Go is a verb.POS [np] POS [v]

(b) Metaphor:My car drinks a lot.ANIMATE [+] ANIMATE [-]

(c) Unknown entry:Kalmár is a family name.POS [np]

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 16: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Linguistic frames

Psychology: ”Gestalt” Morphological complex

structures treated as frames by humans

Frames in AI: ‘shopping’, ‘walking’, ...

As ‘high-level parsing’ relates to ‘detailed on-line analysis’

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 17: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Translation of human languages

old problems (50’s) direct (60’s) interlingual (70’s) transfer (80’s) examples (90’s)

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 18: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Patterns: general linguistic Patterns: general linguistic informationinformation in lexicalized formin lexicalized form Short, fully specified patterns are:

lexical entries Longer, fully specified entries are:

multi-word expressions Partially underspecified patterns are:

collocations, phrasal verbs, idioms Totally underspecified patterns are:

linguistic rules Pattern/interpretation pairs:

Translation Description Language

Page 19: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

The MetaMorpho principlesThe MetaMorpho principles

No single words but contextual expressions (in form of patterns) only

Pattern pairs: input/interpretation structure pairs

Single pass: no separate transfer steps Target structure generation:

by-product of parsing

Page 20: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Jabberwocky

‘Twas brillig, and the slighty tovesDid gyre and gimble in the wabe:All mimsy were the borogroves,And the mone raths outgrabe.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 21: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

‘Twas �, and the � �sDid � and � in the �:All � were the �s,And the � �s �.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 22: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Translation rules for Jabberwocky

‘twas � � volt �, and � �, és � the �s did � a �ok �tak � and � � és � in the � a �ban all � teljesen � � were the �s �k voltak az �ok the �s � a �ok �tek

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 23: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

‘Twas �, and the � �s

Did � and � in the �:All � were the �s,And the � �s �.

� volt, és a � �ok�tak és �tek a �ben:teljesen � voltak a �okés a � �ok �tek.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 24: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Translation of Jabberwocky

Dzsebervoki

Brillig volt, és a szlájti tóvokgájertak és gimbeltek a vébben:teljesen mimszik voltak a borogróvokés a món rátok autgrébtek.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 25: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

An intuitive representation...

1. X-bar based structures2. Feature-based descriptions3. Metarules (used off-line) 4. Rule-to-rule principle5. Lexicon should be finite but

open6. Closed classes belong to the

minimal grammar7. Minimal grammar describes

”basically” linguistic elements

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 26: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

An intuitive representation...

(cont’d)8. Linguistic constructions can be

described by finite patterns9. A huge & finite description set

is used rather than a limited & infinite grammar

10. In case of conflict, lexical information is either redundant or contradicting to the actual description

11. Known constructions need no real-time analysis (Gestalt, frame)

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 27: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

An intuitive representation... (cont’d)

12. ”Broken” frames are analyzed real-time

13. Structural (source/target) pattern pair is assigned to every frame to be translated

14. Target structure is computed while parsing source structure

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation

Page 28: An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Kalmár Kalmár Workshop Workshop

2003 2003

Gábor Prószéky:An Intuitive

Representationof Human Languages

for Translation