Top Banner
CSA3050: Natural Language Algorithms Finite State Devices
52

CSA3050: Natural Language Algorithms

Mar 18, 2016

Download

Documents

kare

CSA3050: Natural Language Algorithms. Finite State Devices. Sources. Blackburn & Striegnitz Ch. 2. Part I. Parsers and Transducers. Parsers vs. Recognisers. Recognizers tell us whether a given input is accepted by some finite state automaton. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSA3050: Natural Language Algorithms

CSA3050: Natural Language Algorithms

Finite State Devices

Page 2: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 2

Sources

• Blackburn & Striegnitz Ch. 2

Page 3: CSA3050: Natural Language Algorithms

Part I

Parsers and Transducers

Page 4: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 4

Parsers vs. Recognisers

• Recognizers tell us whether a given input is accepted by some finite state automaton.

• Often we would like to have an explanation of why it was accepted.

• Parsers give us that kind of explanation.• What form does it take?

Page 5: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 5

Finite State Parser

• The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4].

• The standard technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found.

Page 6: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 6

Base Case

Recogniser

recognize1(Node,[ ]) :-    final(Node).

Parser

parse1(Node,[ ],[Node]) :-    final(Node).

Page 7: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 7

Recursive CaseRecogniser

recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString).

Parserparse1(Node1,

String, [Node1,Label|Path]) :-

arc(Node1,Node2,Label),traverse1( Label,

String,NewString),

 parse1(Node2, NewString, Path).

Page 8: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 8

Words as Labels

• So far we have only considered transitions with single-character labels.

• More complex labels are possible – e.g. words comprising several characters.

• We can construct an FSA recognizing English noun phrases that can be built from the words:

the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast.

Page 9: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 9

FSA for Noun Phrases

Page 10: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 10

FSA for NPs in Prolog

initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch).

arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with).

Page 11: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 11

Parsing a Noun Phrase

testparse1(Symbols,Parse) :-initial(Node),parse1(Node,Symbols,Parse).

?-testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3]

Page 12: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 12

Rewriting Categories

• It is also possible to obtain a more abstract parse, e.g.

?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3]

• What changes are required to obtain this behaviour?

Page 13: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 13

1. Changes to the FSA%FSA %Lexiconinitial(1).           lex(a,det).final(3).             lex(the,det).arc(1,2,det).         lex(fast,adj).arc(2,2,adj).         lex(brave,adj).arc(2,3,cn).          lex(witch,cn).arc(1,3,pn).          lex(wizard,cn).arc(3,1,prep).        lex(broomstick,cn).                      lex(rat,cn).                      lex(harry,pn).                      lex(hermione,pn).                      lex(ron,pn).                      lex(with,prep).

Page 14: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 14

Changes to the ParserParse1

parse1(Node1, String,

[Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label,

String,NewString),

 parse1(Node2, NewString, Path).

Parse2parse2(Node1,

String, [Node1,Label|Path]) :-

arc(Node1,Node2,Label),traverse2( Label,

String,NewString),

 parse2(Node2, NewString, Path). traverse2(Cat,[Word|S],S) :-

   lex(Word,Cat).

Page 15: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 15

Handling Jumpstraverse3('#',String,String).

traverse3(Cat,[Word|Words],Words) :-   lex(Word,Cat).

Page 16: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 16

Finite State Transducers

• A finite state transducer essentially is a finite state automaton that works on two (or more) tapes.

• The most common way to think about transducers is as a kind of “translating machine” which works by reading from one tape and writing onto the other.

Page 17: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 17

A Translator from a to b

• initial state: arrowhead

• final state:double circle

• a:b read from first tape and write to second tape

1

a:b

Page 18: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 18

Prolog Representation

:- op(250,xfx,:). initial(1).final(1).arc(1,1,a:b).

Page 19: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 19

Modes of Operation• generation mode: It writes a string of as on one

tape and a string of bs on the other tape. Both strings have the same length.

• recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs.

• translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape.

• translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape.

Page 20: CSA3050: Natural Language Algorithms

Computational Morphology

Part II

Page 21: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 21

Morphology

• Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes.

• Combination of morphemes to form words that are legal in some language.

• Two kinds of morphology– Inflectional– Derivational

Page 22: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 22

Inflectional/DerivationalMorphology

• Inflectional+s plural+ed past

• category preserving• productive: always

applies (esp. new words, e.g. fax)

• systematic: same semantic effect

• Derivational+ment

• category changingescape+ment

• not completely productive: detractment*

• not completely systematic: apartment

Page 23: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 23

Example: English Noun Inflections

Regular Irregular

Singular cat church mouse ox

Plural cats churches mice oxen

Page 24: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 24

Morphological Parsing

MorphologicalParser

Input Word

cats

OutputAnalysis

cat N PL

• Output is a string of morphemes• lexeme, other meaningful morphemes• Reversibility?

Page 25: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 25

Morphological Parsing

• The goal of morphological parsing is to find out what morphemes a given word is built from. cats cat N PLmice mouse N PLfoxes fox N PL

Page 26: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 26

Morphological Analysis with FSTs

• Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa.

• Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST.

Page 27: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 27

Plural Nouns in English• Regular Forms

– add an s as in wizard+s. – add –es as in witch +s

• Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative.

• Irregular forms– mouse/mice– automaton/automata

• Handled on a case-by-case basis• Require transducer that translates wizard+s into wizard+PL,

witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL.

Page 28: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 28

2 Steps1. Split word up into its possible components,

using + to indicate possible morpheme boundaries.

cats cat + sfoxes fox + smice mouse + s

2. Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixes

cat + s cat NP PLfox + s fox N PLmouse + s mouse N PL

Page 29: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 29

Step 1

• Transducer may or may not insert a ‘+’ (morpheme boundary) if the word ends in ‘s’.

• If the word ends in ses, xes, or zes, it may delete the ‘e’ when inserting the morpheme boundary, e.g.churches → church + s

Page 30: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 30

Transducer for Step 1Surface Intermediate

Page 31: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 31

Transducer for Step 1Surface Intermediate

Page 32: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 32

Prolog Representation• The transducer

specifications we have seen translate easily into Prolog format except for the other transition.

• arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(3,1,<other>).Arc(1,1,<other>).

Page 33: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 33

One Way to Handle <other> arcsarc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(3,1,a:a).arc(3,1,b:b).arc(3,1,c:c).: etc: etcarc(3,1,y:y).

Page 34: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 34

Transducer for Step2 Intermediate Morphemes

Possible inputs to the transducer are:

• Regular noun stem: cat• Regular noun stem + s: cat+s• Singular irregular noun stem: mouse• Plural irregular noun stem: mice

Page 35: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 35

2. Intermediate MorphemesTransducer

Page 36: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 36

Handling Stems

cat /cat

mice/mouse

Page 37: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 37

Completed Stage 2

Page 38: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 38

Joining Stages 1 and 2

• If the two transducers run in a cascade (i.e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases.

• We can change also the direction of translation (in translation mode).

• This transducer can also be used for generating a surface form from an underlying form.

Page 39: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 39

Combining Rules• Consider the word “berries”.• Two rules are involved

– berry + s– y → ie under certain circumstances.

• Combinations of such rules can be handled in two ways– Cascade, i.e. sequentially– Parallel

• Algorithms exist for combining transducers together in series or in parallel.

• Such algorithms involve computations over regular relations.

Page 40: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 40

3 Related Frameworks

REGULARLANGUAGES

REGULAREXPRESSIONS

FSA

Page 41: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 41

Concatenation overFS Automata

a

b

c

d

a

b

c

d

Page 42: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 42

REGULAR RELATIONS

REGULARRELATIONS

AUGMENTEDREGULAR

EXPRESSIONS

FINITE STATETRANSDUCERS

Page 43: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 43

Putting it all together

execution of FSTi

takes place in parallel

Page 44: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 44

Kaplan and KayThe Xerox View

FSTi are alignedbut separate

FSTi intersectedtogether

Page 45: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 45

Summary

• Morphological processing can be handled by finite state machinery

• Finite State Transducers are formally very similar to Finite State Automata.

• They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages.

Page 46: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 46

Exercises

• Change the representation of automata that allow them to be given names.

• Make the corresponding changes to the transducer.

• Write a predicate which allows two named automata to be composed – i.e. the output of one becomes the input of the other

Page 47: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 47

Simple Transducer in Prologtransduce1(Node,[ ],[ ]) :-    final(Node).

transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2).

Page 48: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 48

Traverse for FSTtraverse1(L1:L2,

[L1|RestTape1], RestTape1,

[L2|RestTape2], RestTape2).

testtrans1(Tape1,Tape2) :-    initial(Node),    transduce1(Node,Tape1,Tape2).

Page 49: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 49

Transducers and Jumps

• Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes.

• So, transitions of the form a:# or #:a or #:# are possible.

Page 50: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 50

Handling Jumps:4 cases

• Jump on both tapes.• Jump on the first but not on the second

tape.• Jump on the second but not on the first

tape.• Jump on neither tape (this is what

traverse1 does).

Page 51: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 51

4 Corresponding Clausestraverse2('#':'#',Tape1,Tape1,Tape2,Tape2).traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2).traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2).traverse2(L1:L2,

[L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2).

Page 52: CSA3050: Natural Language Algorithms

October 2005 CSA3180 NLP 52

FST in Prolog

lex(wizard:wizard,’STEM-REG1’).lex(witch:witch,’STEM-REG2’).lex(automaton:automaton,’IRREG-SG’).lex(automata:’automaton-PL’,’IRREG-PL’).lex(mouse:mouse,’IRREG-SG’).lex(mice:’mouse-PL’,’IRREG-PL’).