Top Banner
Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian Padó Univ. of Heidelberg, Germany
24

Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Dec 29, 2015

Download

Documents

Tobias Shields
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Towards a model of formal and informal address in English

Manaal FaruquiLanguage Technologies Institute, CMU(Work done at IIT Kharagpur, India)

Sebastian PadóUniv. of Heidelberg, Germany

Page 2: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

2

Formal and informal address

•Most languages distinguish formal (V) and informal (T) address in direct speech (Brown & Gilman 1960)• Formal address: Neutrality, distance, used for

“superordinates”• Informal address: used for friends, “subordinates”

•Variety of realizations in languages• Frequently pronoun choice (French vous/tu,

German Sie/du)• Verbal inflection (e.g. Japanese)

Page 3: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

3

T/V and English

•Contemporary English is conspicuous by not realizing the T/V distinction• Pronoun “you” is both formal and informal• No differences in verbal inflection

•Does English really differ in such a fundamental way from virtually all other related languages?

Page 4: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

4

Main goals of this work

•Goal 1: Determine whether English distinguishes V and T consistently, but using different indicators• If yes, what are these indicators?

•Goal 2: Develop a computational model that labels English sentences as T or V• Ideally without spending effort on annotation

Page 5: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

5

Methodology

•Use a parallel corpus to analyze aligned sentences with overt (German) T/V choice and covert English T/V choice• For Goal 1: Compare German and English address• For Goal 2: Project German labels onto English

sentences

Page 6: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

6

Digression: Creation of a parallel corpus

•Current parallel corpora are not suitable• EUROPARL: overwhelmingly formal (>99%)• Newswire: no dialogue

•Creation of a new corpus: English—German literary texts • 106 19th-century novels and stories (project

Gutenberg)• Sentence-aligned: Gargantuan (Braune & Fraser 2010)

• POS-tagged (Schmid 1994)

•German sentences can be labeled as T, V or NONE• Rules for labeling follow on the next slide

Page 7: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Labeling German Pronouns as T/V

• Du/du: Singular T• Sie: Singular V (except for utterance initial

positions)

• sie: Ignored• Third person pronoun (she/they)

• ihr: Ignored• Plural T address or archaic sing./plural V

address• Can be ideally distinguished by

capitalization but errors present in the corpus

• Dative form of 3rd person “she” pronoun sie• Neutral wrt T/V

6

Page 8: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

8

Goal 1: Compare German and English

address

•Give English monolingual text to human annotators• Ask for T/V judgment

•Their annotation provides the following information• How well do annotators agree on English text?• Does English monolingual text provide enough

information to identify T/V? (1a)• How well do annotators agree with copied labels? • Is there a direct correspondence ? (1b)• Only if this is the case is the copying of labels

appropriate

Page 9: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

9

Experiment 1: Human Annotation

•200 randomly drawn English sentences

•Two annotators (“A1”, “A2”)

•Two conditions:• No context: just one sentence• In context: three sentences pre- and post-

context each

Page 10: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

10

Results: Reliability

•Context improves reliability• Many sentences can not be tagged with T/V in

isolation

“And she is a sort of relation of your lordship’s,” said Dawson. “And perhaps sometime you may see her.”

• Reliability in context is reasonable:• English does provide strong (if imperfect)

clues on T/V

No Context

In Context

A1 vs. A2

.75 (k=.49) .79 (k=.58)

Goal 1a ✓

Page 11: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

11

Results: Correspondence

No Context

In Context

(A1∩ A2) vs. Projection

.67 (k=.34) .79 (k=.58)

•Agreement with German projected labels again reasonable, but not perfect

•Error analysis showed strong influence of social norms

• Example: Lovers in 19th cent. novels use V (!)

[...] she covered her face with the other to conceal her tears. “Corinne!”, said Oswald, “Dear Corinne! My absence has then rendered you unhappy!”

Goal 1b ✓

Page 12: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

12

Experiment 2: Prediction of T/V

•Copy German T/V labels onto English: No annotation•Learn L2-regularized logit classifier on train set; optimize on dev set; evaluate on test set

•Feature candidates :• Lexical features (bag-of-words, χ² feature

selection)• Distributional semantic word classes

• 200 word classes clustered with the algorithm by Clark (2003)

• Politeness theory (Brown & Levinson 2003)

• Polite speech has specific features, which are inherited by V

Page 13: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Parallel Corpus: Some statistics

• German • #Sent_V: 37K & #Sent_T: 28K• Around 270 (<0.5%) sentences were both T &

V • Ignored!

• No error in manually verified randomly selected 300 German sentences

• English• #Sent_V: 25K & #Sent_T: 18K• Training data: 74 novels (26K)• Development data: 19 novels (9K)• Test data: 13 novels (8K)

• Corpus available at http://www.nlpado.de/12

Page 14: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

14

Politeness theory features

Page 15: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

15

Context

•As shown by human annotation: Individual sentences often insufficient for classification

•Simplest solution: Compute features over a window of context sentences• Problem: context typically includes non-speech

sentences

“I am going to see his ghost!” Lorry quietly chafed the hands that held his arm.

Page 16: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Context

• Our solution: A simple “direct speech” recognizer CRF-based sequence tagger (Mallet) trained on 1000 sentences

• Ideal results for 8 sentences of direct speech context +5% accuracy over no context Sentence context

Speech context

B-SP: “I am going to see his ghost!” O: Lorry quietly chafed the hands that held his arm. 15

Page 17: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

17

Quantitative results

Model AccuracyFrequency BL (V) 59.1

Lexical features 67.0

Semantic class features 57.5

Politeness features 59.6

•Only lexical features yield significant improvement over frequency baseline Goal 2 ✓

Page 18: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

18

Qualitative analysis: Lexical Features

•Top 10 most-associated words for V (left) and T (right)

•V: Titles, formulaic language•T: mixed bag, mostly very infrequent

Page 19: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

19

Qualitative analysis: Semantic classes

•Only 3-4 of 200 classes are associated with T or V

No.

P(c|V) / P(c|T)

Words with highest P(w|V) / P(w|T)

1. 4.59 Mister, sir, Monsieur, sirrah

2. 2.36 Mlle., Mr., Herr, Dr., Mrs.

3. 1.60 Gentlemen, patients, rascals

… … …

200.

0.02believest, lovest, makest,

couldst

Page 20: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

20

Qualitative analysis: Politeness features

•Politeness features failed to yield a good result

•Problem 1: Hand-built lists do have insufficient coverage• Difficult: what linguistic expressions convey “distance”?

•Problem 2: Features (at least in their current version) do not distinguish well between T and V• p(f|V)/p(f|T) values for all classes between 0.9 and 1.3• For 13 of 16 features, p(f|V)/p(f|T) >1: indicative of V

Page 21: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

21

Conclusions

•Formal and informal language exists in English as well• Indicators more dispersed across context

•Bootstrapping a T/V classifier for English possible

•Results still fairly modest• Asymmetry: V more marked than T → better

features• Difficult to operationalize features with high recall

(sociolinguistic features, first names, …)

Page 22: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

Future Work

• Learn social networks from the novel

• Change the scope of T/V from the sentence level to a pair of interlocutors

21

Page 23: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

References• M. Faruqui & S. Pado, “I thou thee, thou traitor”: Predicting

formal vs. informal address in English literature. ACL 2011.• M. Faruqui & S. Pado, Towards a model of formal and

informal address in English. EACL 2012.• Roger Brown and Albert Gilman. 1960. The pronouns of

power and solidarity. In Thomas A. Sebeok, editor, Style in Language, pages 253–277. MIT Press, Cambridge, MA.

• Penelope Brown and Stephen C. Levinson. 1987. Politeness: Some Universals in Language Usage. Number 4 in Studies in Interactional Sociolinguistics. Cambridge University Press.

• Fabienne Braune & Alexander Fraser. Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. COLING 2010

• Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of the International Conference on New Methods in Language Processing, pages 44–49, Manchester, UK.

• Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu. 22

Page 24: Towards a model of formal and informal address in English Manaal Faruqui Language Technologies Institute, CMU (Work done at IIT Kharagpur, India) Sebastian.

24

Thank you!

Questions?

Please write to: [email protected]

[email protected]