Top Banner
1 REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov [email protected]
29

REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS

Jan 09, 2016

Download

Documents

aideen

REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS. Andrej A. Kibrik , Grigorij B. Dobrov , Natalia V. Loukachevitch , Dmitrij A. Zalmanov [email protected]. Referential choice in discourse. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

1

REFERENTIAL CHOICE

AS A PROBABILISTIC MULTI-FACTORIAL PROCESS

Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch,

Dmitrij A. Zalmanov [email protected]

Page 2: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

222

Referential choice in discourse

When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including: Full noun phrase (NP)

• Proper name (e.g. Pushkin)• Common noun (with modifiers) = definite

description (e.g. the poet)

Reduced NP, particularly a third person pronoun (e.g. he)

Page 3: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

3

Example

Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <...>

How is this choice made?

Full NPPronounantecedent coreference

anaphors

Page 4: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

4

Why is this important?

Reference is among the most basic cognitive operations performed by language users

It is the linguistic representation of what is known as attention and working memory in psychology

Reference constitutes a lion’s share of all information in natural communication

Consider text manipulation according to the method of Biber et al. 1999: 230-232

Page 5: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

5

Referential expressions marked in green

Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <...>

Page 6: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

6

Referential expressions removed

Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <...>

Page 7: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

7

Referential expressions kept

Tandy said consumer electronics sales at its Radio Shack stores have been slow, partly because a lack of hot, new products. Radio Shack continues to be lackluster, said Dennis Telzrow, analyst with Eppler, Guerin Turner in Dallas. He said Tandy has done <...>

Page 8: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

888

Plan of talk

I. Referential choice as a multi-factorial process

II. The RefRhet corpus and the machine learning-based approach

III. The probabilistic character of referential choice

Page 9: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

999

Multi-factorial character of referential choice

Many factors of referential choice Distance to antecedent

• Along the linear discourse structure• Along the hierarchical discourse structure

Antecedent role Referent animacy Protagonisthood.........................................

None of these factors alone can explain referential choice

Page 10: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

101010

Factors integration

At every poing in discourse factors are somehow summed and give rise to an integral characterization – the referent’s activation score

Activation score is the referent’s status with respect to the speaker’s working memory

Activation score predetermines referential choice Low full NP Medium full or reduced NP High reduced NP

Page 11: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

111111

Multi-factorial model of referential choice

(Kibrik 1999)

Variousproperties

of the referentor

discourse context

Referent’s activation

score

Referential choice

Activation factors

Page 12: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

121212

Modeling multi-factorial processes: machine learning-based methods

Neural networks approach (Gruening and Kibrik 2005) Machine learning algorithm

• Automatic selection of factors’ weights• Automatic reduction of the number of factors («pruning»)

However:• Small data set• Single method of machine learning• Low interpretability of results

Hence a new study Large corpus Implementation of several machine learning

methods Statistical model of referential choice

Page 13: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

131313

The RefRhet corpus

English Business prose Initial material – the RST Discourse

Treebank Annotated for hierarchical discourse structure 385 articles from Wall Street Journal

The added component – referential annotation

The RefRhet corpus Over 30 000 referential expressions

Page 14: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

141414

Example of a hierarchical graph

Page 15: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

151515

Scheme of referential annotation

The ММАХ2 program Krasavina and Chiarcos 2007All markables are annotated,

including: Referential expressions Their antecedents

Coreference relations are annotated Features of referents and context are

annotated that can potentially be factors of referential choice

Page 16: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

161616

Page 17: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

171717

Work on referential annotation

O. KrasavinaA. AntonovaD. ZalmanovA. LinnikM. Khudyakova Students of the Department of

Theoretical and Applied Linguistics, MSU

Page 18: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

181818

Current state of the RefRhet referential annotation

2/3 completed Further results are based on the following

data: 247 texts 110 thousand words 26 024 markables

• 7097 proper names• 8560 definite descriptions• 1797 third person pronouns

3756 reliable pairs «anaphor – antecedent»• Proper names — 1623 (43%)• Definite descriptions — 971 (26%)• Pronouns — 1162 (31%)

Page 19: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

191919

Factors of referential choice

Properties of the referent: Animacy Protagonisthood

Properties of the antecedent: Type of syntactic phrase (phrase_type) Grammatical role (gramm_role) Form of referential expression (np_form,

def_np_form) Whether it belongs to direct speech or not

(dir_speech)

Page 20: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

202020

Factors of referential choice

Properties of the anaphor: First vs. nonfirst mention in discourse (referentiality) Type of syntactic phrase (phrase_type) Grammatical role (gramm_role) Whether it belongs to direct speech or not (dir_speech)

Distance between the anaphor and the antecedent: Distance in words Distance in markables Linear distance in clauses Hierarchical distance in elementary discourse units

Page 21: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2121

Goals for the machine learning-base study

Dependent variable: Form of referential expression (np_form)

Binary prediction: Full NP vs. pronouns

Three-way prediction: Definite description vs. proper name vs. pronoun

Accuracy maximization: Ratio of correct predictions to the overall

number of instances

21

Page 22: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

222222

Machine learning methods (Weka, a data mining system)

Easily interpretable methods: Logical algorithms

• Decision trees (C4.5)• Decision rules (JRip)

Higher quality:Logistic regression

Quality control – the cross-validation method

Page 23: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2323

Examples of decision rules generated by the JRip algorithm

(Antecedent’s grammatical role = subject) & (Hierarchical distance ≤ 1.5) & (Distance in words ≤ 7) => pronoun

(Animate) & (Distance in markables ≥ 2) & (Distance in words ≤ 11) => pronoun

23

Page 24: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2424

Main results

Accuracy Binary prediction:

logistic regression – 86.1% logical algorithms – 85%

Three-way prediction: logistic regression – 74% logical algorithms – 72%

24

Page 25: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2525

Comparison of single- and multi-factor accuracy

Feature Three-way prediction

Binary prediction

The largest class 43% 69%

Distance in words

55% 76%

Hierarchical distance

53.5% 74.8%

Anaphor’s grammatical role

45.2% 70%

Anaphor in direct speech

43.8% 70%

Animate 47.3% 71.5%

Combination of factors

74% 86.1%25

Page 26: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2626

Referential choice is a probabilistic process

According to Kibrik 1999Potential

referential expressions Actual

referential expressions

Full NP only (19%) Full NP (49%) Full NP, ?pronoun (21 %)

Pronoun or full NP (28%)Pronoun

(51%) Pronoun, ?full NP (23%)

Pronoun only (9%)

Page 27: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2727

Probabilistic character of referential choice in the RefRhet study

Prediction of referential choice cannot be fully deterministic

There is a class of instances in which referential choice is random

It is important to tune up the model so that it could process such instances in a special manner We are working on this problem

Logistic regression generates estimates of probability for each referential option

This estimate of probability can be interpreted as the activation score from the cognitive model

Page 28: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

282828

Probabilistic multi-factorial model of referential choice

Activation score = probability of using

a certainreferential expression

Referentialchoice

Activation factors

Variousproperties

of the referentor

discourse context

Page 29: REFERENTIAL CHOICE  AS A PROBABILISTIC  MULTI-FACTORIAL PROCESS

2929

Conclusions about the RefRhet study

Quantity: Large corpus of referential expressions

Quality: A high level of accurate prediction is already attained And this is not the limit

Theoretical significance: the following fundamental properties of referential choice are addressed: Multi-factorial character of referential choice Probabilistic character of referential choice

This approach can be applied to a wide range of linguistic and other behavioral choices