Top Banner

Click here to load reader

Computer languages for representing grammatical ... · PDF file Plan Historical perspective : High level languages vs Low level language (ex. ANLT) Alternances and structure sharing

Jul 28, 2020

ReportDownload

Documents

others

  • Computer languages for representing grammatical information

    an introduction to XMG

    Benoit Crabbé

    Lattice — Université Paris 7

    Computer languages for representing grammatical informat ion 1

  • Outline of the talk

    We wish to identify the goals and the requirements of a “high level” language for representing grammatical information.

    To do this :

    We will introduce the main ideas of the XMG language (used for TAG and IG) Quick comparison with the XLE language of KDK04 and some others. Motivation : distinguish core problems of these languages from those that are TAG specific.

    Framework : metagrammar = computer-only language dedicated to ease the expression of computational grammars (without any theoretical motivation).

    Computer languages for representing grammatical informat ion 2

  • XMG where from ?

    Initially designed to implement a large scale TAG grammar for French. (Mostly “Historical”) motivations :

    No real large scale French grammar was available/usable. Designed to ease the implementation a large TAG capable of handling semantics.

    Primary extensions / developments : Augmented to handle an additional TAG+ formalism :

    Interaction grammars (Perrier 02) Augmented with semantics (Gardent and Parmentier) Used to implement multilingual grammars (German Korean Yiddish ?)

    Computer languages for representing grammatical informat ion 3

  • Plan

    Historical perspective : High level languages vs Low level language (ex. ANLT) Alternances and structure sharing in the “old languages” (PATR II) Tree Based formalisms : TAG (Becker’s metarules)

    Major change since then (removing lexical rules) Current languages (goals, similarities, comparison)

    Overview of the XMG language Overview of the XLE language (KDK 04) How to be declarative ?

    New Problems encountered : Naming ! Interaction between modules

    Computer languages for representing grammatical informat ion 4

  • High Level vs Low Level languages ( ANLT)

    from John Caroll’s Phd (1993), a reference GPSG implementation: A low level language :

    Well defined (syntax and semantics) : interface with parsing algorithms e.g. CFG + atomic feature structures . . . coded on integers

    A high level language (dubbed metagrammar as in GKPS) : Principles : Head Feature propagation, valency, slash (unbounded dependencies) + metarules (allowing to express alternations such as active / passive)

    Why two languages ? The low level language is interfaced with a parser and does not change. It lacks of expressivity for the linguist however. The high level language is more “sloppy”. It provides more expressive power to the linguist. The high level language is more flexible : it may change according to the linguist needs.

    Computer languages for representing grammatical informat ion 5

  • Structure sharing and alternations (PATR II)

    PATR II (Shieber 85 ) How does high level languages look like ? The simplest high level language Computer language without any theoretical goal. Data structure = feature structures Allows to express :

    Structure sharing (parametrized macros) : allow to implement inheritance hierarchiesVERB : < at> = vVERB-3RD-SING :�VERB = singular = 3 Alternations (lexical rule: yields a new out passive entry from the in active entry )Passive : = = past-prt = =

    Computer languages for representing grammatical informat ion 6

  • Tree Based formalisms at the low level ( TAG)

    (a)

    S N↓ V mange

    N↓

    Jean mange des biscuits John eats the cookies

    (b)

    S N↓ V mangent

    N↓

    Les enfants mangent des biscuits The children eat the cookies

    (c)

    S N↓ V’

    V↓ V mangés

    PP P

    par N↓

    Les biscuits sont mangés par les enfants The cookies are eaten by the children

    (d)

    S N↓ V’

    Cl↓ V mangés

    Les enfants les ont mangés The children have eaten them

    (e)

    S PP

    P par

    N↓ S

    N↓ V’ V↓ V mangés

    Par quels enfants les biscuits sont-ils mangés ? By which children do the cookies have been eaten ?

    (a) is a canonical context (b) is a plural context (c) is a passivised context (d) is a clitic argument context (e) is a passivised context with wh extraction

    Computer languages for representing grammatical informat ion 7

  • Example TAG : high level apparatus (Becker 93)

    Structure sharing : canonical trees are organised in an inheritance hierarchy (like PATR macros) Metarules allow to express alternations ≈ transformations compiled offline prior to parsing restricted to a local domain) yielding additional elementary trees. (procedural and shown to be undecidable)

    S

    N↓0 V N↓1

    −→

    S

    N↓1 V PP

    V V⋄mode=ppart P N↓0

    être par LR-PASSIVE

    S

    V N↓ −→

    S

    V

    Cl↓ V LR-CLITIC(OBJ)

    Computer languages for representing grammatical informat ion 8

  • Nowadays : observations

    Observations from the following bibliography : TAG : XMG stuff LFG : M. Dalrymple, R. Kaplan, and T.H. King, 2004, Linguistic Generalizations over Descriptions, Proceedings of the LFG’04 Conference. CSLI Publications. pp. 199-208.

    Observation #1 : Similar devices if we abstract data structures (trees and features) Keep Shieber like macros Reject lexical rules (provide an alternative declarative device)

    Observation #2 : Specific devices for tree based formalisms Naming ! (Tag, Hpsg) Interactions (Tag)

    Problem faced Lexical rules = procedural device (allows to add, remove and update information). Raise practical problems of rule ordering when developping non toy grammars

    Solution : Suppress lexical rules and provide an alternative declarative device for expressing the same information.

    Computer languages for representing grammatical informat ion 9

  • What for ? Goals of these languages

    These are just computer languages driven towards implementation purposes. No theory behind : that’s up to the user to use the language and build its own theory/grammar.

    Goals/Requirements for such a language : To ease grammatical development The task of grammar development is not easy : the grammar writer has to use a langage as simple (and expressive) as possible. . . Factoring out information : Computational grammars are naturally expressed in a redundant fashion (low level language). One wants to capture generalisations in modules. Content of a module : a module contains partial grammatical information Composition : The language must supply an commutative associative operation for combining the modules in order to ignore ordering issues.

    Computer languages for representing grammatical informat ion 10

  • Modularisation

    TAG = trees ; LFG = feature structures; HPSG = partially typed FS

    Example (TAG) : Identifying a potential module :

    S

    N↓ V⋄ N↓

    Jean mange des biscuits John eats cookies

    N

    N* S

    N↓ S

    N↓ V⋄

    Les biscuits que Jean mange The cookies that John eats

    We wish to define a module that encapsulates an information otherwise copied over several units. To do this, each module receives a name :

    CANONICAL SUBJECT → S

    N↓ V

    Computer languages for representing grammatical informat ion 11

  • Combining modules

    Using a “or” and an “and” (XLE, XMG) Example (XMG) :

    (choice) of descriptions (1) Subject → CanonicalSubject ∨ RelativisedSubject

    The subject is either canonical or relativised. The disjunction is a choice (nondeterministic interpretation) This implementation using a choice simulates an alternation expressed in the literature by a lexical rule that would move a canonical subject to its relative counterpart in a derived structure. (See also HPSG : Controll) Conjunction of descriptions (2) IntransitiveVerb → Subject ∧ ActiveForm

    A conjunction of descriptions is interpreted as the syntactic conjunction of two tree description (logical formulae) where node names are renamed (TAG), or by unifying feature structures (LFG).

    Computer languages for representing grammatical informat ion 12

  • Valuation of the grammar

    Last step : Valuation of the grammar. This sums up to generate all the solutions of a (non recursive) logic program Provide a module as the axiom of a non recursive grammar and generate all the solutions. (tree descriptions can be seen as the words of the language generated by that grammar) Sample axiom INTRANSITIVEVERB IntransitiveVerb → Subject ∧ ActiveForm Subject → CanonicalSubject ∨ RelativisedSubject

    S

    N↓ V Le garçon. . . The boy. . .

    ∧ S

    V⋄ dort

    sleeps

    S

    N↓ V⋄ Le garçon dort The boy sleeps

    N

    N* S

    N↓ V (Le garçon) qui. . . (The boy) who. . .

    S

    V⋄

    dort sleeps

    N

    N* S

    N↓ V⋄ Le garçon qui dort

    The boy who sleeps

    Computer languages for representing grammatical informat ion 13

  • So you said you were declarative. . .

    Disjunction allows to manipulate tree sets. View lexical rules as operators that yield tree sets out of canonical trees Thus :

    S

    N↓ V’ N↓

    V⋄

    Jean voit Marie

    S

    V’ N↓

    V