Top Banner
Robust rule-based parsing (quick overview) I. Robustness II. Three robust rule-based parsers of English III. Common features IV. Example : identification of subjects in Syntex
24

Robust rule-based parsing

Nov 29, 2014

Download

Technology

Estelle Delpech

Material of the Natural Language Processing (NLP) Workshop with STIC-Asia representatives and the Nepal team.
August 30-31, 2007.
Patan Dhoka, Lalitpur, Nepal.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robust rule-based parsing

Robust rule-based parsing(quick overview)

I. RobustnessII. Three robust rule-based

parsers of EnglishIII. Common featuresIV. Example : identification of

subjects in Syntex

Page 2: Robust rule-based parsing

I. Robustness (Aït-Mohktar et al. 1997)

« the ability to provide useful analyses for real-word input text. By useful analyses, we mean analyses that are (at least partially) correct and usable in some automatic task or application »

implies : 1 analysis (even partial) for any real world input ability to process irregular input, to overcome error

analysis efficiency

Page 3: Robust rule-based parsing

I. Types of robust parsers (Aït Mokhtar et al. 1997)

based on traditional theorical models with rule-based and/or stochastic post-processing Minipar (Lin 1995)

most parsers are hybrid

stochastic parsers Charniak’s parser (2000)

rule-based parsers Non-Projective Dependency Parser (Järvinen & Tapanainen

1997) Syntex (Bourigault 2007) Cass (Abney 1990,1995)

Page 4: Robust rule-based parsing

II.1 Non-Projective Dependency Parser (Tapanainen & Järvinen 1997)

Syntactic Labeling

Tagged Text

OUTPUT

Selection of syntactic links

« all legitimate surface-syntactic labels are added to the set of morphological readings »

« syntactic rules discard contextually illegitimate alternatives or select legitimate ones »

valencysubcategorization

information

PruningGeneral heuristics disambiguate the last of the syntactic links

Page 5: Robust rule-based parsing

II.1 Non-Projective Dependency Parser (Tapanainen & Järvinen 1997)

If the preceding the word is an unambiguous auxiliary, the current word is the subject of this auxiliary

SELECT (@SUBJ)IF (1C AUXMOD HEAD);

Rules are contextual : How do you do ?

AUX

SUBJ

Rules use syntactic links established by preceding rules

Rules establish dependency links between words

Page 6: Robust rule-based parsing

II.2 Syntex (Bourigault 2007)

Tagged Text

OUTPUT

Object, Subject

Prep Attachement

Verb Chunk

Endogenous and exogenous

subcategorization information

non recursive SP

non recursive NP

Endogenous and exogenous

subcategorization information

he will leave

the man

from Paris

happy tree friends

??

This is the man from Paris?

?

This is the man

Page 7: Robust rule-based parsing

II.2 Syntex (Bourigault 2007)

One module per syntactic relation Each module processes the sentence from left to right.

Those who think they are interested in water supply must vote

Like the Non-Projective Dependency Parser, the rules establish dependency relations between words are contextual use syntactic links established by preceding rules

The identification of a dependency link is formulated as a «path» to be followed up through the existing links and grammatical categories from governor to dependent or from dependent to governor Ambiguous relations : selection of potential governors +

desambiguisation with probabilities

Page 8: Robust rule-based parsing

II.3 Cass (Abney 1990,1995)

CHUNK FILTER

Tagged Text

OUTPUT

CLAUSE FILTER

[NP the happy tree friends]

subcategorizationinformation PARSE FILTER

NP filter

Chunk filter

Raw Clause filter

Clause Repair filter

Subject-predicate relationBeginning and end of simplex clauses

Non recursive chunksInternal structure remains ambiguous

[SUBJThis] [PREDis] [NPthe man][SPfrom Paris]

Repair if no Subject-predicate relation

Assembles recursive structures

[SP from [NP the happy tree friends][VP will leave]

[[This] [is] [NPthe man][SPfrom Paris] ]

Page 9: Robust rule-based parsing

II.3 Cass (Abney 1990,1995)

Each filter uses transducers :

PP (Prep|To)+(NP|Vbg)

Use of repair (also used in Syntex and NPDP but less explicit):

« when errors become apparent downstream, the parser attempts to repair them »

Each filter makes a decision (determinism), the safest one in case of ambiguity

« ambiguity is not propagated downstream » « repair consists in directly modifying erroneous structure

without regard to the history of computation that produced the structure »

Page 10: Robust rule-based parsing

[SPIn [NPSouth Australia beds]][SPof [NPboulders]][VPwere deposited]

II.3 Cass (Abney 1990,1995)

Example of repair In South Australia beds of boulders were deposited …

[SPIn [NPSouth Australia beds]][SPof [NPboulders]][VPwere deposited]

Erroneous structure output from the Chunk filter

Raw Clause filter : no subject is found

Repair filter tries to find a subject by modifying the structure

[SPIn [NPSouth Australia]][NP-SUBJbeds][SPof boulders][VPwere deposited]

Page 11: Robust rule-based parsing

III. Common features : Incrementality The parsing task is divided into substasks

reduces the overall complexity of the main task :

« factoring the problem into a sequence of small, well defined

questions » (Abney 1990).

problem of circularity : difficult to choose in what order the relation should be identified (Bourigault 2007)

The sentence is parsed in several phases, each phase producing an intermediate structure

allows each phase to use the syntactic information left by the predecing phase

« the level of abstraction produced during the 1st phase (...) facilitates the description of deeper syntactic relations» (Aït-Mohktar et al. 1997)

ease of maintenance

Page 12: Robust rule-based parsing

III. Common features : determinism and repair

Each parsing phase yields one solution. In case of ambiguity, the safest choice is made, even if

some higher level information is needed ambiguity is not propagated downstream

Most regular errors can be repaired later on ≠ parallelism, backtracking

« The salient performance is not errors vs no errors, but the tradeoff between speed and error rate » (Abney 1990)

Page 13: Robust rule-based parsing

III. Common features: no syntactic theory

Use of common grammatical knowledge Hours of corpus observation to find clues for automatic

identification

Difference between : the theoretical study of the syntactic structures of language automatic identification of grammatical relation in real-word

texts

Difficulties in automatic syntactic analysis : lack of knowledge (semantics/pragmatics for desambiguation) deviation from the norm of the language errors of preceding processing steps

Page 14: Robust rule-based parsing

III. Common features : implicit grammatical knowledge

Bipartite architecture : Lexical information Recognition routines

No independent declaration of grammatical knowlege

Difficult / impossible to set apart : Grammatical knowledge Non grammar-based heuristics

No linguist/computer scientist job separation

Need both linguistic and programming know-hows

A condition to scalability and robustness

Page 15: Robust rule-based parsing

IV. Example : the subject relation in Syntex

The identification of the subject relation is formulated as a «path» through the already identified grammatical relations :

the cost of technology takes time to shrinkDet Noun Prep Noun Verb Noun Prep Noun

DET PREP NOMPREP OBJ NOMPREP

SUJ

TENSED VERB

takes

SUBJECT

cost

stop when you encounter an ungoverned Noun

move to the left

start from tensed verb

Page 16: Robust rule-based parsing

IV. Using existing links

The Subject might be far from the tensed verb Lots of configuration are possible :

Initiatives leading to cessation of smoking in workplaces

are adopted

Those who think they are interested in water supply must

vote.

No reference to the war, or to the alliance, should remain

Existing links form dependency islands (~syntagms or isolated words)

Following up the islands until a reasonnable subject is found allows to find subjects without describing all possible configurations or doing too much computing

PP PPGerund

PPClauseClause

PPPP Conj

Page 17: Robust rule-based parsing

IV. Ambiguities

Many persons have died in Darfur since the conflict began

A person sitting on the death row since the age of 16 is

not the same as before.

Many adults believe education equates intelligence.

Those who think they are interested in water supply must

vote.

When to stop? When to follow up ? When to repair ?

Page 18: Robust rule-based parsing

IV. Path decomposition At each island, a decision is made by a dedicated sub-

module (one type of island = one sub-module) :

stop and identify a subject

follow up to the island on the left

stop and return failure

without repair with repair

change path direction to the right to any other position in the sentence

Decisions are encoded as if-then rules that may test : local and non-local context : lemmas, ms tags, links, presence of commas…

specific information left by other modules : encountered tags, activated modules …

call other module

Page 19: Robust rule-based parsing

IV. Path Example : following up

Korea who we believe to have WMD is safe from us.

PP module

Clause module

Korea

_ RelPron [[SUJPron] Verb ]

SUBJ

Clause PP

Page 20: Robust rule-based parsing

IV. Path example : repair

Many adults believe education equates intelligence.

Clause

Clause module

## [[SUBJNP] Verb [OBJNP]] Verb[ [SUBJNP] Verb ]OBJ

SUBJ

OBJ

Page 21: Robust rule-based parsing

IV. Path example : sub-module call

On the walls were scarlett banners

Wall module

## [PP] Verb _

InvertedSubject module

banners

SUBJ

PP module

PP

NP

Page 22: Robust rule-based parsing

IV. Path example : change path

On the contrary, war hysteria was continuous and

deliberate, and acts such as looting, murdering, the

slaughters of prisonners, were considered as normal.

Commas module

PP module Clause module

Adj

Conj

PP module

All three political Parties at the federal level, and certainly at the provincial level in different sections, have parity clauses.

Although no directive was ever issued, it was known that the chief of the Departement intended that within one week no reference to the war with Eurasia, or to the alliance, should remain

Noun

+2.6 Recall

-0.07 Precision

Page 23: Robust rule-based parsing

IV. Evaluation on Susanne Corpus

Tensed verbIdentification(TreeTagger)

SubjectIdentification

(if tensed verb correct)

SUBJECT RELATION

(correct tensed verb and correct subject)

precision 94,87 94,56 89,51

recall 89,76 90,84 81,53

f-mesure 92,24 92,66 85,33

Shallow subjects evaluation only are not identified or evaluated :

I’ve never seen the dog hiding his bones. She wants me to clean my shoes The book is read by the boy

Page 24: Robust rule-based parsing

Bibliography

Abney (1990) : « Rapid Incremental Parsing with Repair », Proceedings of the 6th New OED Conference, University of Waterloo, Waterloo, Ontario.

Abney (1995) : «Partial Parsing with finite state cascade », Natural Language Engineering, Cambridge University Press www.sfs.uni-tuebingen.de/~abney/StevenAbney.html#cass

Aït-Mokhtar et al. (1997) : « Incremental Finite State Parsing », Proceedings of the ANLP-97, Washington

Bourigault (2007) : Syntex, analyseur syntaxique opérationnel, Thèse d’Habilitation à Diriger les Recherches, Université Toulouse - Le Mirail. w3.univ-tlse2.fr/erss/textes/pagespersos/bourigault/syntex.html

Charniak (2000): «A maximum-entropy-inspired parser », In The Proceedings of the North American, Chapter of the Association for Computational Linguistics,pp 132–139. http://www.cfilt.iitb.ac.in/~anupama/charniak.php

Lin (1995) :« Dependency-based Evaluation of Minipar », Proceedings of JCAI. http://www.cs.ualberta.ca/~lindek/downloads.htm

Tapanainen & Järvinen (1997) : « A Dependency Parser for English», Technical Reports, No.TR-1, Department of General Linguistics University, March 1997. www.connexor.com

TreeTagger : http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ Evaluation Corpus : ftp://ftp.cs.umanitoba.ca/pub/lindek/depeval