Top Banner
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16
46

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

LING/C SC/PSYC 438/538Computational Linguistics

Sandiway Fong

Lecture 15: 10/16

Page 2: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Administrivia

• No lecture this Thursday

Page 3: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Today’s Topics

• Midterm review

• Finite State Transducers (FST)

Page 4: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 1

• Download the file wsj.txt (~ 50K lines)• Write a Perl program • that finds all lines containing any possible

form of the idiom

take ... advantage of ...• How many are there in wsj.txt?• Submit your program• Submit the lines returned by your program

Page 5: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 1

• First hit on Google:– take advantage (of someone) to use someone's weakness to

improve your own situation. Mr. Smith often takes advantage of my friendship and leaves the unpleasant tasks for me to do.See also: advantage, take

– take advantage (of something) to use an opportunity to get or achieve something. He took advantage of the prison's education program to earn a college degree. There are peaches and strawberries grown on the farm, and I sure take full advantage of them.Usage notes: often said of someone who has opportunities that others do not have: The rich can take advantage of clever accounting tricks to avoid taxes.See also: advantage, take

– Cambridge Dictionary of American Idioms – Cambridge University Press 2003

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 6: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 11. Investors took advantage of Tuesday 's stock rally2. Like other forms of arbitrage , it merely seeks to take advantage of momentary discrepancies3. As usually practiced it takes advantage of a rather basic concept4. So if index arbitrage is simply taking advantage of thin inefficiencies5. `` If you could get the rhythm of the program trading , you could take advantage of it . '' 6. Mrs. Gorman took advantage of low prices 7. According to Upjohn 's estimates , only 50 % to 60 % of the 1,100 eligible employees will take advantage of the plan . 8. Nissan has increased earnings more than market share by cutting costs and by taking advantage of a general surge 9. Mr. Peladeau took his first big gamble 25 years ago , when he took advantage of a strike at La Presse10. In addition , the two companies will develop new steam turbine technology , such as the plants ordered by Florida

Power , and even utilize each other 's plants at times to take advantage of currency fluctuations . 11. One of GE 's goals when it bought 80 % of Kidder in 1986 was to take advantage of `` syngeries '' 12. I take advantage of this opportunity given to me by The Wall Street Journal

• And taking more direct action has the advantage of avoiding sharp increases13. To take advantage of local expertise and custom14. Several blue-chip companies tapped the new-issue market yesterday to take advantage of falling interest rates . 15. He also noted that a strong sterling market yesterday might have helped cocoa in New York as arbitragers took

advantage of the currency move . 16. My kids ' college education looms as perhaps the greatest future opportunity for spending , although I 'll probably have

to cash in their toy portfolio to take advantage of it . 17. As the ad 's tone implies , the Texas spirit is pretty xenophobic these days , and Lone Star is n't alone in trying to take

advantage of that . 18. IBM , which Gartner Group said generates 22 % of its revenue in this market , should be able to take advantage of its

loyal following19. Erik Keller , a Gartner Group analyst , said organizational changes may still be required to really take advantage of

CIM 's capabilities

Page 7: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 120. These latter-day scalawags would be ill-advised to take advantage of the situation

21. Most of trading action now is from professional traders who are trying to take advantage of the price swings

22. For instance , First Quadrant Corp. , an asset allocator based in Morristown , N.J. , said it quickly boosted stock positions in its `` aggressive '' accounts to 75 % from 55 % to take advantage of plunging prices Friday .

23. Others are doing `` index arbitrage '' a strategy of taking advantage of price discrepancies

24. The campaign , created by Omnicom Group 's DDB Needham agency , takes advantage of the eye-catching photography

25. According to industry lawyers , the ruling gives pipeline companies an important second chance to resolve remaining disputes and take advantage of the cost-sharing mechanism .

26. Thanks to a new air-traffic agreement and the ability of Irish travel agents to issue Aeroflot tickets , tourists here are taking advantage of Aeroflot 's reasonable prices

27. But , `` You never can tell , '' he added , `` you have to take advantage of opportunities .

28. A broad rally began when several major processors began buying futures contracts , apparently to take advantage of the price dip .

29. `` We hope to take advantage of it , ''

30. And we hope to take advantage of panics

31. To take full advantage of the financial opportunities32. Specifically , it must understand how real-estate markets overreact to shifts in regional economies and then take

advantage of these opportunities .

Page 8: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 1

• Perl Program: – a simple way to exclude the case shown

earlier

open (F,$ARGV[0]) or die "$ARGV[0] not found!\n";

while (<F>) {

print $_ if (/\b(take|takes|taking|taken|took)\b(.*) advantage of/ && $2 !~ /\bthe\b/)

}

Page 9: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 2

• Give a regular grammar in Prolog notation that accepts strings with an odd number of a’s (#a’s =1,3,5,...) followed by an even number of b’s (#b’s = 2,4,6,...)

• i.e. anbm

n odd, m even

• Examples: – aaabb– abbbb– aaaaabb– *aabb– *aaab

• Submit your program• Show it works on the

given examples

Page 10: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 2

• Regular grammar in Prolog DCG format:

1. s --> [a], b.

2. s --> [a], d.

3. b --> [a], s.

4. d --> [b], e.

5. e --> [b].

6. e --> [b], d.

• Run| ?- s([a,a,a,b,b],[]).

yes

| ?- s([a,b,b,b,b],[]).

yes

| ?- s([a,a,a,a,a,b,b],[]).

yes

| ?- s([a,a,b,b],[]).

no

| ?- s([a,a,a,b],[]).

no

Page 11: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 3

• Using an extra argument with regular grammar rules in Prolog DCG format, give a grammar that accepts

• L = anbm • n even (n=2,4,6,...)• m is the odd number

closest to but not exceeding n/2

• Note: L is a non-regular language

• Examples:– aab– aaaab– *aaaabb– aaaaaabbb– *aaaaaabbbb– aaaaaaaabbb– *aaaaaaaabbbb– *aaaaaaaabbbbb

• Show your program works on the above examples

Page 12: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 3

• Program1. s(X) --> [a], b(s(X)).

2. b(X) --> [a], c(s(X)).

3. b(X) --> [a], s(s(X)).

4. c(s(s(0))) --> [b].

5. c(s(s(s(s(0))))) --> [b].

6. c(s(s(X))) --> [b], d(X).

7. d(s(s(X))) --> [b], c(X).

• Run| ?- s(0,[a,a,b],[]).yes| ?- s(0,[a,a,a,a,b],[]).yes| ?- s(0,[a,a,a,a,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,b,b,b],[]).yes| ?- s(0,[a,a,a,a,a,a,b,b,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b],[]).yes| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b],[]).no| ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b,b],[]).no

Page 13: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 4

• Give a regexp for the language described in Question 2

• anbm

n odd, m even

Page 14: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 4

• anbm n odd, m even

• a(aa)*(bb)+

Page 15: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 5

• Give a regexp for the complement of the following FSA

1

2

4

35

ba

ab

a

b

a,ba

b

Page 16: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 5

• Original machine is deterministic

• Flip the states

1

2

4

35

ba

ab

a

b

a,ba

b

1

2

4

35

ba

ab

a

b

a,ba

b

Page 17: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 5

• Notice 5 is a dead-end state

• Erase 5

1

2

4

35

ba

ab

a

b

a,ba

b

1

2

4

3

ba

ab

a

b

Page 18: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 5

• Eliminated state 5 • Eliminate states 2 and 4

1

2

4

3

ba

ab

a

b

1 3

ab

ba

ab

ba

(ab|ba)*

Page 19: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 5

• Eliminated state 5

• Equations– E1 = aE2 | bE4 | λ– E2 = bE3– E4 = aE3– E3 = aE2 | bE4 | λ

• Eliminate E4– E1 = aE2 | baE3 | λ– E3 = aE2 | baE3 | λ

• Eliminate E2– E1 = abE3 | baE3 | λ– E3= abE3 | baE3 | λ

• Group E3– E1 = (ab|ba)E3 | λ– E3 = (ab|ba)E3 | λ

• Solve E3– E3 = (ab|ba)*– E1 = (ab|ba)(ab|ba)*|λ = (ab|ba)*

1

2

4

3

ba

ab

a

b

Page 20: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Question 6

• Give the deterministic FSA corresponding to:

Page 21: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Answer 6

• Deterministic machine

1

5

2a

3c 4

c

a

b

6b

a

8

a

c

7a

Page 22: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Finite State Transducers

• Just like Finite State Automata (FSA) except for an output tape

• Mealy Machine formulation:– at each transition, a FST

can read an input symbol and output a (different) symbol onto the tape

• Background reading– Chapter 3 of the

textbook

Page 23: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Morphology

• morphology– words are composed of morphemes – morpheme: basic semantic unit, e.g. -ee in employee– Inflectional: no change in category, e.g. V -ed V– can carry information about tense, personal, number,

gender, case etc. – Derivational: category-changing, e.g. V -able A

– very productive

Page 24: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Walkers. Standees.

© Sandiway Fongsign above travelatorat Pittsburgh International Airport

Page 25: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Today’s Topic

• Finite State Transducers (FST) for morphological processing

– ... also Prolog implementation

Page 26: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Recall Finite State Automata (FSA)

• from lecture 8– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): {a, b}5. transition function :

signature: character × state → state1. (a,s)=x2. (a,x)=x3. (b,x)=y4. (b,y)=y

s x

y

aa

b

b

Page 27: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Modeling English Adjectives using FSA

– from section 3.2 of textbook

• examples– big, bigger, biggest, *unbig– cool, cooler, coolest, coolly– red, redder, reddest, *redly– clear, clearer, clearest, clearly, unclear, unclearly– happy, happier, happiest, happily– unhappy, unhappier, unhappiest, unhappily– real, *realer, *realest, unreal, really

• fsa (3.4)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Initial machineis overly simple

need more classesto make finer grain distinctions

e.g. *unbig

Page 28: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Modeling English Adjectives using FSA

• divide adjectives into classes• examples

– adj-root2: big, bigger, biggest, *unbig– adj-root2: cool, cooler, coolest, coolly– adj-root2: red, redder, reddest, *redly– adj-root1: clear, clearer, clearest, clearly, unclear, unclearly– adj-root1: happy, happier, happiest, happily– adj-root1: unhappy, unhappier, unhappiest, unhappily– adj-root1: real, *realer, *realest, unreal, really

• fsa (3.5)

However...Examplesuncooler •Smoking uncool and getting uncooler.•google: 22,800 (2006), 10,900 (2005) *realer •google: 3,500,000 (2006) 494,000 (2005)

*realest •google: 795,000 (2006) 415,000 (2005)

Page 29: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Modeling English Adjectives using FSA

e.g. *unbig google: 2,590 hits (2007)

morphology is productivemorphemes carry (compositional) meaningcan be used for dramatic effect unbig vs. small

Page 30: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

The Mapping Problem

• To map between a surface form and the decomposition of a word into its components– e.g. root + (person/number/gender) and other features

• using spelling rules

• Example: (3.11)

Notes:^ marks a morpheme boundary# is the end-of-word marker

Page 31: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• example:– f o x +N +PL (lexical)– f o x ^s# (intermediate)

• lexical level: – uninflected “dictionary” level

• intermediate level: – replace abstract morphemes by concrete ones

• key– +N: noun

• fox can also be a verb, • but fox +V cannot combine with +PL

– +PL: (abstract) plural morpheme• realized in English as s (basic case)

– boundary markers ^ and # • for use by the spelling rule machine (later)

Page 32: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• example:– f o x +N +PL (lexical)– f o x ^s# (intermediate)

• machine idea – character-by-character correspondences– f f – o o– x x– +N ( = empty string)– +PL ^s#

• use a Finite State Machine with input/output mapping– Finite State Transducer (FST)

Page 33: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• Example:– g o o s e +N +PL (lexical)– g e e s e # (intermediate)

• Example:– g o o s e +N +SG (lexical)– g o o s e # (intermediate)

• Example:– m o u s e +N +PL (lexical)– m i c e # (intermediate)

• Example:– s h e e p +N +PL (lexical)– s h e e p # (intermediate)

Page 34: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• 3.11

Notation:

input : output

f means f:f

Page 35: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Extension to Finite State Transducers (FST)

• [Mealy machine extension to FSA]– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): pairs I:O– I = input alphabet, O = output alphabet

– ε may be included in I and O

– transition function (or matrix) : signature: i/o pair × state → state1. (a:b,s)=x2. (a:b,x)=x3. (b:a,x)=y4. (b:ε,y)=y

s x

y

a:b a:b

b:ε

b:a

Page 36: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Finite State Automata (FSA)

• recall: one possible Prolog encoding strategy

– define one predicate for each state• taking one argument (the input string)• consume input character• call next state with remaining input string

– query•?- s(L).

call start state s

Page 37: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Finite State Automata (FSA)

– define one predicate for each state• take one argument (the input string), and consume input character• call next state with remaining input string

– query• ?- s(L). i.e. call start state s

– state s: (start state)• s([a|L]) :- x(L).

– state x:• x([a|L]) :- x(L).• x([b|L]) :- y(L).

– state y: (end state)• y([]).• y([b|L]) :- y(L).

s x

y

aa

b

b

simple extension to FST: each predicate takes two arguments:input and output

Page 38: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• example– s0([f|L1],[f|L2]) :- s1(L1,L2).– s0([c|L1],[c|L2]) :- s3(L1,L2).

– s1([o|L1],[o|L2]) :- s2(L1,L2).– s2([x|L1],[x|L2]) :- s5(L1,L2).– s3([a|L1],[a|L2]) :- s4(L1,L2).– s4([t|L1],[t|L2]) :- s5(L1,L2).

– s5([‘+N’|L1],L2) :- s6(L1,L2).– s6([‘+PL’|L1],[^,s,#|L2]) :- s7(L1,L2).– s7([],[]). % end state

Page 39: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• FST queries– lexical intermediate

• ?- s0([f,o,x,’+N’,’+PL’],X).– X = [f, o, x, ^, s, #]

– intermediate lexical • ?- s0(X,[f,o,x,^,s,#]).

– X = [f, o, x, '+N', '+PL']

– enumerator• ?- s0(X,Y).

– X = [f, o, x, '+N', '+PL']– Y = [f, o, x, ^, s, #] ;– X = [c, a, t, '+N', '+PL']– Y = [c, a, t, ^, s, #] ;

• No

inversion of a transducer T: T-1

switch input and output labels

in Prolog, simply change the call

Page 40: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 1: Lexical Intermediate Levels

• Figure 3.17 (top half):tape view of input/output pairs

Page 41: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

The Mapping Problem

• Example: (3.11)

• (Context-Sensitive) Spelling Rule: (3.5) e / {x,s,z}^__ s#

rewrites to letter e in left context x^ or s^ or z^ and right context s#

• i.e. insert e after the ^ when you see x^s# or s^s# or z^s#

• in particular, we have x^s# x^es#

Page 42: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 2: Intermediate Surface Levels

• also can be implemented using a FSTimportant!machine is designed to pass input not matching the rule through unmodified (rather than fail)

implements context-sensitive ruleq0 to q2 : left contextq3 to q0 : right context

Page 43: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 2: Intermediate Surface Levels

• Example (3.17)

Page 44: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 2: Intermediate Surface Levels

• Transition table for FST in 3.14

pg.79

Page 45: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Stage 2: Intermediate Surface Levels

• in Prolog (simplified)– q0([],[]). % final state– q0([^|L1],L2) :- !, q0(L1,L2). % ^: – q0([z|L1],[z|L2]) :- !, q1(L1,L2). – % repeat for s,x– q0([#|L1],[#|L2]) :- !, q0(L1,L2).– q0([X|L1],[X|L2]) :- \+ mentioned(X),

q0(L1,L2). % other

• ! is known as the “cut” predicate– it affects how Prolog backtracks for another

solution

– it means “cut” the backtracking off

– Prolog will not try any other possible matching rule on backtracking

Page 46: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Exercise

• Ungraded exercise:– Implement 3.14 in Prolog– Make sure you can do e-insertion and the

inverse operation, i.e. go from surface form to intermediate form