Top Banner
1 Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych Automated Verb Sense Labelling Based on Linked Lexical Resources
23

Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

Mar 20, 2017

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

1

Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych

Automated Verb Sense Labelling

Based on Linked Lexical Resources

Page 2: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

2

Outline

Evaluation

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Take Home Messages

Automated Verb Sense Labelling in a Nutshell

Page 3: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

3 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Motivation

Motivation

Sense annotated corpora are important resources in NLP

usually created manually which is time consuming and expensive

verbs have more senses and thus, annotating verb senses is more

difficult

Solution

Using a large-scale linked lexical resource for creating data annotated

with verb senses automatically

UBY

Page 4: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

4

Linking Lexical Resources at the Word Sense

Level – example: UBY

Web 2.0

IMSLex-Subcat

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

UBY

Page 5: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

5

Linking Lexical Resources at the Word Sense

Level – example: UBY

Web 2.0

IMSLex-Subcat

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

UBY

Open Source Java API: http://code.google.com/p/uby/

Page 6: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

6 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Automated Verb Sense Labelling: Approach

UBY

Corpus

Uby: Verb Sense Patterns derived from lexical information

Corpus: Verb Sense Patterns derived from verb instances

Similarity Metric

Page 7: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

7

WN ask%2:32:01 (make a request or demand for something to somebody)

is linked to FN Id 639 (request to do or give something):

As twenty are required it might pay to ask your supplier for a ` bulk discount ".

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Step 1: Creation of sense patterns from

enriched senses

UBY

Uby: [ask%2:32:0] be PP VV to ask person for a JJ act

Page 8: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

8 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Step 1: Creation of sense patterns from

enriched senses

Page 9: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

9 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Step 1: Creation of sense patterns from

enriched senses

sense enrichment predicate argument structure information

Page 10: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

10 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Step 2: Automated Labelling based on Pattern

Similarity

WN ask%2:32:01 is linked to FN Id 639:

As twenty are required it might pay to ask your supplier for a ` bulk discount ".

UBY

he would n't be pleased if a rumdum like me were to ask

his daughter for a date

Similarity score: 0.217 > threshold

Uby: [ask%2:32:01] be PP VV to ask person for a JJ act

Corpus: if PP be to ask person for a time

Page 11: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

11 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Step 2: Automated Labelling based on Pattern

Similarity

Using a similarity metric to compare patterns derived from UBY and

patterns derived from verb instances found in corpora

Considers the common bi-, tri-, and four-grams of two patterns:

Takes word order into account!

w >= 1 is the window around the verb

Gn(pi) is the set of ngrams occurring in pattern pi

Page 12: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

12

Outline

Evaluation

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Take Home Messages

Automated Verb Sense Labelling in a Nutshell

Page 13: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

13 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Intrinsic Evaluation

Evaluation for occurrences of Senseval-3 verbs in SemCor (152 verbs)

Ca. 33.000 sense patterns generated from WN-FN-WKT for these verbs

various similarity thresholds t

Page 14: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

14 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Extrinsic Evaluation – Experimental Setup

Comparison of two supervised classifiers for verb sense

disambiguation:

1. Trained on an automatically labelled corpus (ALC):

Verb senses for test verbs given in MASC and Senseval-3 are

labelled in a huge Web Corpus with similarity threshold t=0.1

2. Trained on SemCor 3.0

Test data:

1. MASC corpus: 16 verbs annotated with WordNet 3.0 senses, 11 997

test instances

2. Senseval-3 dataset for all-words WSD: 152 verbs annotated with

WordNet 3.0 senses, 442 test instances

Page 15: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

15 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Training Sets

0 100000 200000 300000 400000

Training Data ALC

SemCor

SemCor 3.0

Ca. 22.000 train instances of 16

MASC and 152 Senseval-3 verbs

Automatically labelled corpus (ALC)

Ca. 350.000 train instances of 16

MASC and 152 Senseval-3 verbs

Page 16: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

16 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Classification

Preprocessing: POS tagging, dependency parsing and Named

Entity recognition

using the TreeTagger and the Stanford Parser and Named

Entity Recognizer form the DKPro Core component collection,

http://dkpro-core-asl.googlecode.com

Features: lexical, syntactic and semantic features

Classification: A separate logistic regression classifier is

trained for each of the test verbs, using WEKA,

http://www.cs.waikato.ac.nz/ml/weka/

Page 17: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

17

Performance of classifiers (accuracy)

evaluated on MASC / Senseval-3

SemCor 3.0

Evaluation on MASC: 50.23

Evaluation on Senseval-3: 48.64

(45.20 with back-off)

Automatically labelled corpus (ALC)

Evaluation on MASC: 49.00

Evaluation on Senseval-3: 47.51

(43.24 with back-off)

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

MFS Baseline for the two test sets

1. MASC: MFS baseline: 41.72

2. Senseval-3: MFS baseline: 25.34

Training Sets

Page 18: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

18 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Extrinsic Evaluation – effect of sense

enrichment

Best results with the combination WordNet-FrameNet-Wiktionary

WordNet-FrameNet achieves similar accuracy but the coverage is lower

WordNet-FrameNet-Wiktionary-VerbNet achieves lower accuracy

Using WordNet only achieved the lowest coverage and accuracy

Page 19: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

19

Outline

Evaluation

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Take Home Messages

Automated Verb Sense Labelling in a Nutshell

Page 20: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

20

Linked Lexical Resources such as UBY are knowledge bases …

… that can be used to perform automated verb sense labelling

the automatically labelled data can successfully be used to train

supervised Machine Learning systems: Distant / Weak Supervision

This is due to the enriched sense representation for word senses

that are interlinked

Particularly useful for languages such as German where lexical resources

are available but no sense-labelled data exist.

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Take Home Messages

Page 21: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

21 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Thank You!

Questions?

Page 22: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

22 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Training Data Coverage

Coverage of WN senses annotated in MASC in the training data:

There are 22 WN senses with instances in MASC which are not found in

SemCor

There are 34 WN senses with instances in MASC which are not found in

the ALC

The VSD system cannot correctly classify instances of those senses

The Coverage of the WN senses annotated in the test sets by the training

data constitutes the upper bound of our classifiers:

ALC: 0.8805 (increasing the size of the ALC does not help)

SemCor: 0.948

Page 23: Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

23 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Comparison with other systems for verb sense

disambiguation

State-of-the-art supervised system (Chen and Palmer 2009) on Senseval-

2 data :

0.648 accuracy, MFS baseline: 0.407

Not comparable due to different versions of WordNet used

Best performing Lesk-based system (Miller et al., 2012):

33.86% accuracy for the MASC verbs

30.16% accuracy for the Senseval-3 verbs