Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)
Mar 28, 2015
Interaction Grammars and their
implementation in LEOPAR
Interaction Grammars and their
implementation in LEOPAR
Guy Perrier
University Nancy2 - LORIA (Nancy)
1- Why a new linguistic formalism ?
• Some crucial points in the design of a linguistic formalism :
o The form of the basic bricks,
o The composition rules,
o The syntax-semantics interface.
• Among the usual formalisms, none prevails on all others.
1- Why a new linguistic formalism ?
• The originality of Interaction GrammarsInteraction Grammars (CoLing 2000):
o For the syntax, the basic bricks are underspecified trees underspecified trees represented in
the form of tree descriptions tree descriptions (this aspect comes from formalisms
stemming from TAGTAG);
o The composition of underspecified trees to build completely specified trees is
performed by superpositionsuperposition under the control of a polarity polarity system.
Polarity neutralization expresses the saturation of syntactic structures (this
aspect comes from Categorial GrammarsCategorial Grammars) .
2- The importance of an experimental
approach
• The relevance of a linguistic formalism can only be proved in a confrontation with
real corporareal corpora.
• The development of the LEOPAR parserparser answers this ambition.
• The change of scale requires two conditions :
o parsing algorithmsparsing algorithms that are efficientefficient enough to overcome the explosion
of ambiguity which follows;
o lexicons lexicons and grammars grammars with large coverage large coverage .
3 - The formalism of Interaction
Grammars
• The basic syntactic objects are tree descriptionstree descriptions : a tree description is a set of
relations and properties on tree nodes representing syntactic constituents.
• Relations are (immediate and large) dominance relations dominance relations or (immediate and large)
precedence relationsprecedence relations.
• Nodes are labelled with feature structuresfeature structures describing properties of syntactic
constituents. Feature values are atoms or atom disjunctions and they can be shared
by several features.
3 - The formalism of Interaction
Grammars
• FeaturesFeatures are polarizedpolarized :
o negative featuresnegative features (f v) represent expected resources;
o positive featurespositive features (f v) represent available resources;
o neutral featuresneutral features (f = v) represent properties which do not behave as
consumable resources.
3 - The formalism of Interaction
Grammars
• A syntactic description represents an underspecified syntactic tree. In other words, it
represents a family of syntactic trees which are the modelsmodels of the description.
• Among all models of a description, only neutral and minimal modelsneutral and minimal models are
linguistically relevant:
o A neutral modelneutral model realizes the neutralisation of every negative feature with a
positive feature and conversely.
o A minimal modelminimal model adds a minimum of information to the description.
3 - The formalism of Interaction
Grammars
• The construction of neutral and minimal models for a description is performed by
iterating the operation of feature neutralisationfeature neutralisation: this operation consists in merging
two nodes labelled with two dual features (f v and f v).
• The neutralisation of two features entails a partial tree superposition partial tree superposition by
propagating constraints defining the description .
3 - Modelling of syntactic phenomena
in French
• Barriers to extraction
o L’invitation que Jean demande à Marie
o L’invitation que Jean pense demander à Marie
o * L’invitation que Marie connaît Jean qui demande
• Pied piping
o A la femme de qui Jean demande-t-il une invitation ?
o A la femme de qui Jean pense-t-il demander une invitation ?
• Negation (ne … personne, ne… aucun)
o Personne ne demande une invitation à Marie.
o Jean ne demande aucune invitation à Marie.
o Jean ne demande une invitation à personne.
o Jean ne demande une invitation à la femme d’aucun ingénieur .
4 - Principle of the LEOPAR parser
• LEOPAR is developed inside the Calligramme team by Guillaume BonfanteGuillaume Bonfante,
Bruno GuillaumeBruno Guillaume, Sylvain PogodallaSylvain Pogodalla and Guy PerrierGuy Perrier.
• This work started in 2003. After a first release of the parser, a second release is
now available. It includes 17000 lines of OCAML code.
• The parser is freely downloadable under Cecill licence at URL :
http://www.loria.fr/equipes/calligramme/leopar/download.html .
4 - Principle of the LEOPAR parser
Parsing of the sentence : Jean a demandé une invitation à Marie
tokenization
lexical selection
Jean demandé une invitation à Mariea
ProperNounN0VS1aN2 StandardDet NaN1deN2 VerbPrep
1 20x
N0VN1aN2
.
.
.
x 1 x
ProperNoun
CommonN
.
.
.
4 x 4 1x = 2560
InfCompl
.
.
.
8 x
N0VN1
Avoir...
4 - Principle of the LEOPAR parser
Input filtering
ProperNounN0VS1aN2 StandardDet NaN1deN2 VerbPrep
1 20x
N0VN1aN2
.
.
.
x 1 x
ProperNoun
CommonN
.
.
.
4 x 4 1x = 2560
InfCompl
.
.
.
8 x
N0VN1
Avoir...
ProperNoun StandardDet VerbPrepN0VN1aN2 ProperNounCommonNAvoir
Avoir
Avoir= 3
4 - Principle of the LEOPAR parser
Parsing
ProperNoun StandardDet VerbPrepN0VN1aN2 ProperNounCommonNAvoir
Avoir
Avoir= 3
Jean demandé une invitation à Marie
PPNP
S
NP V Det N Prep NP
a
Aux
V
5 - Input filtering
• PrinciplePrinciple : for every input choice, there is a parse only if the polarity balancepolarity balance is
nullnull for every feature and for every feature value.
• This is a globalglobal input filtering criterion.
• For every feature value, we build an automatonautomaton which counts polarities. A path in the
automaton represents an input choice and we keep it only if the polarity balance is null
along this path for the considered feature value.
• Because feature values can take the form of disjunctions, the automaton can be
nondeterministicnondeterministic. It is determinised by computing possible polarity intervals instead
of precise values.
• Filtering can be improved in different ways : bounding polarity intervals, using specific
properties of coordination, adding probabilities.
6 - Parsing
• The principle is to build a neutral and minimal model of the syntactic description
corresponding to every path in the automaton.
• The current strategy implemented in LEOPAR is a left-to-right strategyleft-to-right strategy. In
order to reduce the search space, a boundbound is put on the number of active
polarities allowed during the parsing process.
• The automaton is visited from left to right. If the number of active polarities in the
current description is under the bound, we take a shift stepshift step in the automaton,
increasing the current description. Otherwise, we take a reduce stepreduce step : we reduce
the number of active polarities under the bound by performing neutralisations.
6 - Parsing
• The strategy has two drawbacks: because of the bound on the number of active
polarities, it is not completenot complete and, in order to avoid to produce the same solution
several times, the sequence of neutralisations must respect a fixed orderorder.
• The parsing efficiency can be improved by using a top-down strategytop-down strategy.
• RobustnessRobustness can be taken into account by using a bottom-up strategybottom-up strategy.
7 - Lexical and grammatical resources
with large coverage
• The construction and the maintenance of large lexicons and grammars require to
conciliate the size of such resources with linguistic (readability) and computing
(efficiency) constraints.
• These resources should be reusable reusable as much as possible for other formalisms.
• All the resources which we produce are freely availablefreely available.
8 - A lexicon independent of the
formalism
• The lexicon used by the parser is not built directly but it results from the
combination of a morpho-syntactic lexiconmorpho-syntactic lexicon independent of the formalism with
a grammar written in the formalism of Interaction Grammars.
• The morpho-syntactic lexicon results from the combination of a morphological morphological
lexiconlexicon with a syntactic lexiconsyntactic lexicon.
• We have built a syntactic lexicon with 400 entries in order to test LEOPAR on the
French sentences of the TSNLP TSNLP (Test Suite for Natural Language Processing).
• In a joint work with Claire Gardent, Bruno Guillaume and Ingrid Falk, we have
designed a method to extract a lexicon from the LADLLADL tables. With this method,
we have produced a lexicon from 11 tables and 2000 verbs.
9 - A two-level grammar :
source and object
• The principle is to consider two levels for the grammar :
o A sourcesource grammargrammar is written by a human in a high level language well suited
to the expression of linguistic regularities.
o The source grammar is compiledcompiled into an objectobject grammar grammar which is
directly usable in a NLP system.
• Denys Duchier, Joseph Le Roux, Yannick Parmentier and Benoit Crabbé (LORIA)
have developed a grammatical description language associated with a compiler.
The system is called XMG (eXtendible MetaGrammar).
• We used XMG to produce a French interaction grammar (740 descriptions).
10 - Prospects
• To develop more efficient parsing strategies which integrate robustnessrobustness.
• To integrate semantics semantics.
• To extend the coveragecoverage of the French grammar.
• To improve the efficiency of the parser by using statisticsstatistics.