Top Banner
LING 581: Advanced Computational Linguistics Lecture Notes April 27th
30

LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

LING 581: Advanced Computational Linguistics

Lecture NotesApril 27th

Page 2: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

TCE

• 2nd last class today…

Page 3: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

WordNet Homework

If you haven’t already, you should have emailed me this report

Page 4: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

QA Homework

• Idea: evaluate the feasibility of QA on the web– using TREC 9 QA examples– programming it up is optional

– see and appreciate why it’s hard to do...

• Steps:– Pick 3 query groups– Simulate (programmatically) QA– use the Collins parser and WordNet to find answers to the queries– submit report (before final class next week)

• Example• Question groupWhat kind of animal was Winnie the Pooh?Winnie the Pooh is what kind of animal? What species was Winnie the Pooh?Winnie the Pooh is an imitation of which animal?What was the species of Winnie the Pooh?

Page 5: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Example

• trees

• reformulate Qs into declarative sentences with missing wh-phrase– ____ (kind of animal) is winnie the pooh– winnie the pooh is ____ (species)– winnie the pooh is an imitation of ____ (animal)– the species of winnie the pooh is ____

Page 6: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Example

• Answers:• Winnie the Pooh is such a

popular character in Poland• Winnie-the-Pooh Is My Co-

worker• Winnie the Pooh is a little,

adorable and cute bear obsessed by honey.

• Winnie-the-Pooh is so fat.• Winnie the Pooh is one of the

things most closest to my heart• Winnie the Pooh is his usual

befuddled self

Page 7: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Example

• Original declarative form:– winnie the pooh is ____

(species) • Check semantic relatedness of

extracted head words using WordNet:– character– co-worker– little– one– self

• Here, look at shortest paths

SummaryHeadword Length #nodes–bear 6 9258–character 6 1072–one 7 734 –co-worker 7 6488–self 7 14456–little 10 28706

Constraints:length < #nodes

Page 8: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Other resources

• XWN: http://xwn.hlt.utdallas.edu/

glosses in Logical Form

Page 9: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

XWN

• Applications– The Extended WordNet may be used as a Core Knowledge

Base for applications such as Question Answering, Information Retrieval, Information Extraction, Summarization, Natural Language Generation, Inferences, and other knowledge intensive applications.

– The glosses contain a part of the world knowledge since they define the most common concepts of the English language.

Page 10: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

XWN

Example: • Dan Moldovan and Adrian Novischi, Lexical Chains for Question Answering, COLING 2002

Page 11: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS

• Take a look at an alternative to WordNet for computing similarity– WordNet: handbuilt system– COALS:

• the correlated occurrence analogue to lexical semantics• (Rohde et al. 2004)• a instance of a vector-based statistical model for similarity

– e.g., see also Latent Semantic Analysis (LSA) – Singular Valued Decomposition (SVD)

» sort by singular values, take top k and reduce the dimensionality of the co-occurrence matrix to rank k

• based on weighted co-occurrence data from large corpora

Page 12: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS

• Basic Idea:– compute co-occurrence counts for (open class) words from a large

corpora– corpora:

• Usenet postings over 1 month• 9 million (distinct) articles• 1.2 billion word tokens• 2.1 million word types

– 100,000th word occurred 98 times

– co-occurrence counts• based on a ramped weighting system with window size 4

– excluding closed-class items

4 4

wi

332 2 11

wi-1wi-2wi-3wi-4 wi+4wi+3wi+2wi+1

Page 13: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS

• Example:

Page 14: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS

• available online• http://dlt4.mit.edu/~dr/COALS/similarity.php

Page 15: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Computing Similarity

Page 16: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Worked Example: zealous• run connectbf/3

– ?- connectbf(impassioned,zealous,X).– X = 10 ?– ?- connectbf(zealous,impassioned,X).– X = 9 ?

• compare to b. ravenous– ?- connectbf(ravenous,zealous,X).– no– ?- connectbf(zealous,ravenous,X).

• shortest link between impassioned and zealous

Old Code: WordNet 1.7.1

Page 17: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Worked Example: zealous

• shortest path between ravenous and zealous

Page 18: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Task: Match each word in the first column with its definition in the second column

accolade

abateaberrant

abscondacumen

abscissionacerbic

accretionabjureabrogate

deviation

abolishkeen insight

lessen in intensitysour or bitter

building updepart secretly

renounceremovalpraise

Page 19: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Task: Match each word in the first column with its definition in the second column

accolade

abateaberrant

abscondacumen

abscissionacerbic

accretionabjureabrogate

deviation

abolishkeen insight

lessen in intensitysour or bitter

building updepart secretly

renounceremovalpraise3

2

3

2

2

2

2

Page 20: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREACCOLADE

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 21: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREABERRANT

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 22: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREABATE

-0.1

-0.05

0

0.05

0.1

0.15

0.2

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 23: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREABSCOND

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 24: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREACUMEN

-0.05

0

0.05

0.1

0.15

0.2

0.25

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 25: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREACERBIC

-0.1

-0.05

0

0.05

0.1

0.15

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 26: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREACCRETION

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISECorrelation

Page 27: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREABJURE

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 28: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

COALS and the GREABROGATE

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

DEVIATIONINSIGHT ABOLISH LESSEN

SOURDEPART BUILD

RENOUNCEREMOVAL

PRAISE

Correlation

Page 29: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Task: Match each word in the first column with its definition in the second column

accolade

abateaberrant

abscondacumen

abscissionacerbic

accretionabjureabrogate

deviation

abolishkeen insight

lessen in intensitysour or bitter

building updepart secretly

renounceremovalpraise

Page 30: LING 581: Advanced Computational Linguistics Lecture Notes April 27th.

Heuristic: competing words, pick the strongest

accolade

abateaberrant

abscondacumen

abscissionacerbic

accretionabjureabrogate

deviation

abolishkeen insight

lessen in intensitysour or bitter

building updepart secretly

renounceremovalpraise

7 out of 10