CSA3050: Natural Language Algorithms Finite State Devices
Dec 30, 2015
CSA3050: Natural Language Algorithms
Finite State Devices
October 2004 CSA3050 NLP Algorithms 2
Sources
• Blackburn & Striegnitz Ch. 2
October 2004 CSA3050 NLP Algorithms 3
Parsers vs. Recognisers
• Recognizers tell us whether a given input is accepted by some finite state automaton.
• Often we would like to have an explanation of why it was accepted.
• Parsers give us that kind of explanation.
• What form does it take?
October 2004 CSA3050 NLP Algorithms 4
Finite State Parser
• The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4].
• The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found.
October 2004 CSA3050 NLP Algorithms 5
Base Case
Recogniser
recognize1(Node,[ ]) :- final(Node).
Parser
parse1(Node,[ ],[Node]) :- final(Node).
October 2004 CSA3050 NLP Algorithms 6
Recursive Case
Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString).
Parserparse1(Node1,
String, [Node1,Label|Path]) :-
arc(Node1,Node2,Label),traverse1( Label,
String,NewString),
parse1(Node2, NewString, Path).
October 2004 CSA3050 NLP Algorithms 7
Complex Labels
• So far we have only considered transitions with single-character labels.
• More complex labels are possible – e.g. symbols comprising several characters.
• We can construct an FSA recognizing English noun phrases that can be built from the words:
the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast.
October 2004 CSA3050 NLP Algorithms 8
FSA for Noun Phrases
October 2004 CSA3050 NLP Algorithms 9
FSA for NPs in Prolog
initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch).
arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with).
October 2004 CSA3050 NLP Algorithms 10
Parsing a Noun Phrase
testparse1(Symbols,Parse) :-
initial(Node),parse1(Node,Symbols,Parse).
?-testparse1([the,fast,wizard],Z).
Z=[1, the, 2, fast, 2, wizard, 3]
October 2004 CSA3050 NLP Algorithms 11
Rewriting Categories
• It is also possible to obtain a more abstract parse, e.g.
?- testparse2([the,fast,wizard],Z).
Z=[1, det, 2, adj, 2, noun, 3]
• What changes are required to obtain this behaviour?
October 2004 CSA3050 NLP Algorithms 12
1. Changes to the FSA
%FSA %Lexiconinitial(1). lex(a,det).final(3). lex(the,det).arc(1,2,det). lex(fast,adj).arc(2,2,adj). lex(brave,adj).arc(2,3,cn). lex(witch,cn).arc(1,3,pn). lex(wizard,cn).arc(3,1,prep). lex(broomstick,cn). lex(rat,cn). lex(harry,pn). lex(hermione,pn). lex(ron,pn). lex(with,prep).
October 2004 13
Changes to the ParserParse1
parse1(Node1, String,
[Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label,
String,NewString),
parse1(Node2, NewString, Path).
Parse2parse2(Node1,
String, [Node1,Label|Path]) :-
arc(Node1,Node2,Label),traverse2( Label,
String,NewString),
parse2(Node2, NewString, Path). traverse2(Label,[Symbol|
Symbols],Symbols) :- lex(Symbol,Label).
October 2004 CSA3050 NLP Algorithms 14
Handling Jumps
traverse3('#',String,String).
traverse3(Cat,[Word|Words],Words) :- lex(Word,Cat).
October 2004 CSA3050 NLP Algorithms 15
Finite State Transducers
• A finite state transducer essentially is a finite state automaton that works on two (or more) tapes.
• The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other.
October 2004 CSA3050 NLP Algorithms 16
A Translator from a to b
• initial state: arrowhead
• final state:double circle
• a:b read from first tape and write to second tape
October 2004 CSA3050 NLP Algorithms 17
Prolog Representation
:- op(250,xfx,:). initial(1).final(1).arc(1,1,a:b).
October 2004 CSA3050 NLP Algorithms 18
Modes of Operation
• generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length.
• recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs.
• translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape.
• translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape.
October 2004 CSA3050 NLP Algorithms 19
Transducers and Jumps
• Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes.
• So, transitions of the form a:# or #:a or #:# are possible.
October 2004 CSA3050 NLP Algorithms 20
Simple Transducer in Prolog
transduce1(Node,[ ],[ ]) :- final(Node).
transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2).
October 2004 CSA3050 NLP Algorithms 21
Traverse for FST
traverse1(L1:L2, [L1|RestTape1],
RestTape1, [L2|RestTape2], RestTape2).
testtrans1(Tape1,Tape2) :- initial(Node), transduce1(Node,Tape1,Tape2).
October 2004 CSA3050 NLP Algorithms 22
Handling Jumps:4 cases
• Jump on both tapes.
• Jump on the first but not on the second tape.
• Jump on the second but not on the first tape.
• Jump on neither tape (this is what traverse1 does).
October 2004 CSA3050 NLP Algorithms 23
4 Corresponding Clauses
traverse2('#':'#',Tape1,Tape1,Tape2,Tape2).
traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2).
traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2).
traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2).
October 2004 CSA3050 NLP Algorithms 24
Morphological Analysis with FSTs
• Morphology is concerned with the internal structure of words.– How can a word be decomposed into morphemes?– How do the morphemes combine?– What are legitimate combinations?
• Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa.
• Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST.
October 2004 CSA3050 NLP Algorithms 25
Plural Nouns in English
• Regular Forms – add an s as in wizard+s. – add –es as in witch +s
• Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative.
• Irregular forms– mouse/mice– automaton/automata
• Handled on a case-by-case basis• Require transducer that translates wizard+s into
wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL.
October 2004 CSA3050 NLP Algorithms 26
FST for English Plurals
October 2004 CSA3050 NLP Algorithms 27
FST in Prolog
lex(wizard:wizard,`STEM-REG1').lex(witch:witch,`STEM-REG2').lex(automaton:automaton,`IRREG-SG').lex(automata:`automaton-PL',`IRREG-PL').lex(mouse:mouse,`IRREG-SG').lex(mice:`mouse-PL',`IRREG-PL').