Top Banner
Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)
20

Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

Mar 28, 2015

Download

Documents

Sierra Turner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

Interaction Grammars and their

implementation in LEOPAR

Interaction Grammars and their

implementation in LEOPAR

Guy Perrier

University Nancy2 - LORIA (Nancy)

Page 2: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

1- Why a new linguistic formalism ?

• Some crucial points in the design of a linguistic formalism :

o The form of the basic bricks,

o The composition rules,

o The syntax-semantics interface.

• Among the usual formalisms, none prevails on all others.

Page 3: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

1- Why a new linguistic formalism ?

• The originality of Interaction GrammarsInteraction Grammars (CoLing 2000):

o For the syntax, the basic bricks are underspecified trees underspecified trees represented in

the form of tree descriptions tree descriptions (this aspect comes from formalisms

stemming from TAGTAG);

o The composition of underspecified trees to build completely specified trees is

performed by superpositionsuperposition under the control of a polarity polarity system.

Polarity neutralization expresses the saturation of syntactic structures (this

aspect comes from Categorial GrammarsCategorial Grammars) .

Page 4: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

2- The importance of an experimental

approach

• The relevance of a linguistic formalism can only be proved in a confrontation with

real corporareal corpora.

• The development of the LEOPAR parserparser answers this ambition.

• The change of scale requires two conditions :

o parsing algorithmsparsing algorithms that are efficientefficient enough to overcome the explosion

of ambiguity which follows;

o lexicons lexicons and grammars grammars with large coverage large coverage .

Page 5: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

3 - The formalism of Interaction

Grammars

• The basic syntactic objects are tree descriptionstree descriptions : a tree description is a set of

relations and properties on tree nodes representing syntactic constituents.

• Relations are (immediate and large) dominance relations dominance relations or (immediate and large)

precedence relationsprecedence relations.

• Nodes are labelled with feature structuresfeature structures describing properties of syntactic

constituents. Feature values are atoms or atom disjunctions and they can be shared

by several features.

Page 6: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

3 - The formalism of Interaction

Grammars

• FeaturesFeatures are polarizedpolarized :

o negative featuresnegative features (f v) represent expected resources;

o positive featurespositive features (f v) represent available resources;

o neutral featuresneutral features (f = v) represent properties which do not behave as

consumable resources.

Page 7: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

3 - The formalism of Interaction

Grammars

• A syntactic description represents an underspecified syntactic tree. In other words, it

represents a family of syntactic trees which are the modelsmodels of the description.

• Among all models of a description, only neutral and minimal modelsneutral and minimal models are

linguistically relevant:

o A neutral modelneutral model realizes the neutralisation of every negative feature with a

positive feature and conversely.

o A minimal modelminimal model adds a minimum of information to the description.

Page 8: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

3 - The formalism of Interaction

Grammars

• The construction of neutral and minimal models for a description is performed by

iterating the operation of feature neutralisationfeature neutralisation: this operation consists in merging

two nodes labelled with two dual features (f v and f v).

• The neutralisation of two features entails a partial tree superposition partial tree superposition by

propagating constraints defining the description .

Page 9: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

3 - Modelling of syntactic phenomena

in French

• Barriers to extraction

o L’invitation que Jean demande à Marie

o L’invitation que Jean pense demander à Marie

o * L’invitation que Marie connaît Jean qui demande

• Pied piping

o A la femme de qui Jean demande-t-il une invitation ?

o A la femme de qui Jean pense-t-il demander une invitation ?

• Negation (ne … personne, ne… aucun)

o Personne ne demande une invitation à Marie.

o Jean ne demande aucune invitation à Marie.

o Jean ne demande une invitation à personne.

o Jean ne demande une invitation à la femme d’aucun ingénieur .

Page 10: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

4 - Principle of the LEOPAR parser

• LEOPAR is developed inside the Calligramme team by Guillaume BonfanteGuillaume Bonfante,

Bruno GuillaumeBruno Guillaume, Sylvain PogodallaSylvain Pogodalla and Guy PerrierGuy Perrier.

• This work started in 2003. After a first release of the parser, a second release is

now available. It includes 17000 lines of OCAML code.

• The parser is freely downloadable under Cecill licence at URL :

http://www.loria.fr/equipes/calligramme/leopar/download.html .

Page 11: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

4 - Principle of the LEOPAR parser

Parsing of the sentence : Jean a demandé une invitation à Marie

tokenization

lexical selection

Jean demandé une invitation à Mariea

ProperNounN0VS1aN2 StandardDet NaN1deN2 VerbPrep

1 20x

N0VN1aN2

.

.

.

x 1 x

ProperNoun

CommonN

.

.

.

4 x 4 1x = 2560

InfCompl

.

.

.

8 x

N0VN1

Avoir...

Page 12: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

4 - Principle of the LEOPAR parser

Input filtering

ProperNounN0VS1aN2 StandardDet NaN1deN2 VerbPrep

1 20x

N0VN1aN2

.

.

.

x 1 x

ProperNoun

CommonN

.

.

.

4 x 4 1x = 2560

InfCompl

.

.

.

8 x

N0VN1

Avoir...

ProperNoun StandardDet VerbPrepN0VN1aN2 ProperNounCommonNAvoir

Avoir

Avoir= 3

Page 13: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

4 - Principle of the LEOPAR parser

Parsing

ProperNoun StandardDet VerbPrepN0VN1aN2 ProperNounCommonNAvoir

Avoir

Avoir= 3

Jean demandé une invitation à Marie

PPNP

S

NP V Det N Prep NP

a

Aux

V

Page 14: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

5 - Input filtering

• PrinciplePrinciple : for every input choice, there is a parse only if the polarity balancepolarity balance is

nullnull for every feature and for every feature value.

• This is a globalglobal input filtering criterion.

• For every feature value, we build an automatonautomaton which counts polarities. A path in the

automaton represents an input choice and we keep it only if the polarity balance is null

along this path for the considered feature value.

• Because feature values can take the form of disjunctions, the automaton can be

nondeterministicnondeterministic. It is determinised by computing possible polarity intervals instead

of precise values.

• Filtering can be improved in different ways : bounding polarity intervals, using specific

properties of coordination, adding probabilities.

Page 15: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

6 - Parsing

• The principle is to build a neutral and minimal model of the syntactic description

corresponding to every path in the automaton.

• The current strategy implemented in LEOPAR is a left-to-right strategyleft-to-right strategy. In

order to reduce the search space, a boundbound is put on the number of active

polarities allowed during the parsing process.

• The automaton is visited from left to right. If the number of active polarities in the

current description is under the bound, we take a shift stepshift step in the automaton,

increasing the current description. Otherwise, we take a reduce stepreduce step : we reduce

the number of active polarities under the bound by performing neutralisations.

Page 16: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

6 - Parsing

• The strategy has two drawbacks: because of the bound on the number of active

polarities, it is not completenot complete and, in order to avoid to produce the same solution

several times, the sequence of neutralisations must respect a fixed orderorder.

• The parsing efficiency can be improved by using a top-down strategytop-down strategy.

• RobustnessRobustness can be taken into account by using a bottom-up strategybottom-up strategy.

Page 17: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

7 - Lexical and grammatical resources

with large coverage

• The construction and the maintenance of large lexicons and grammars require to

conciliate the size of such resources with linguistic (readability) and computing

(efficiency) constraints.

• These resources should be reusable reusable as much as possible for other formalisms.

• All the resources which we produce are freely availablefreely available.

Page 18: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

8 - A lexicon independent of the

formalism

• The lexicon used by the parser is not built directly but it results from the

combination of a morpho-syntactic lexiconmorpho-syntactic lexicon independent of the formalism with

a grammar written in the formalism of Interaction Grammars.

• The morpho-syntactic lexicon results from the combination of a morphological morphological

lexiconlexicon with a syntactic lexiconsyntactic lexicon.

• We have built a syntactic lexicon with 400 entries in order to test LEOPAR on the

French sentences of the TSNLP TSNLP (Test Suite for Natural Language Processing).

• In a joint work with Claire Gardent, Bruno Guillaume and Ingrid Falk, we have

designed a method to extract a lexicon from the LADLLADL tables. With this method,

we have produced a lexicon from 11 tables and 2000 verbs.

Page 19: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

9 - A two-level grammar :

source and object

• The principle is to consider two levels for the grammar :

o A sourcesource grammargrammar is written by a human in a high level language well suited

to the expression of linguistic regularities.

o The source grammar is compiledcompiled into an objectobject grammar grammar which is

directly usable in a NLP system.

• Denys Duchier, Joseph Le Roux, Yannick Parmentier and Benoit Crabbé (LORIA)

have developed a grammatical description language associated with a compiler.

The system is called XMG (eXtendible MetaGrammar).

• We used XMG to produce a French interaction grammar (740 descriptions).

Page 20: Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)

10 - Prospects

• To develop more efficient parsing strategies which integrate robustnessrobustness.

• To integrate semantics semantics.

• To extend the coveragecoverage of the French grammar.

• To improve the efficiency of the parser by using statisticsstatistics.