uds-logo Introduction to Morphology Subdomains of Morphology Properties of Morphemes Morphology in Computational Linguistics Introduction to Morphology Linguistics for Computer Scientists Session 4 Antske Fokkens Department of Computational Linguistics Saarland University 03 October 2009 Antske Fokkens Morphology 1 / 69
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction to MorphologyLinguistics for Computer Scientists
Session 4
Antske Fokkens
Department of Computational LinguisticsSaarland University
03 October 2009
Antske Fokkens Morphology 1 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Outline
1 Introduction to MorphologyIntroductionWhat are morphemes?
2 Subdomains of Morphology
3 Properties of MorphemesMorphemes and their shapesMorphological Processes
4 Morphology in Computational LinguisticsAutomataFinite State Transducers
Antske Fokkens Morphology 2 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Outline
1 Introduction to MorphologyIntroductionWhat are morphemes?
2 Subdomains of Morphology
3 Properties of MorphemesMorphemes and their shapesMorphological Processes
4 Morphology in Computational LinguisticsAutomataFinite State Transducers
Antske Fokkens Morphology 3 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
What is Morphology?
Morphology is the study of form and structure.
In linguistics, it generally refers to the study of form andstructure of words.
Antske Fokkens Morphology 4 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
What is morphology?
The term Morphology can refer to three different things
a Description of the behaviour of morphemes and how theyare combined.
b Derivational, inflectional and compositional processes ofword formation occurring in a specific language.e.g. “German has a richer morphology than English”
c Description of such word formation processes.
Antske Fokkens Morphology 5 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
What are Morphemes?
Morphemes
Morphemes are minimal meaning-bearing units:e.g. talked contains two morphemes: talk and -ed (past).Form-function pairs (sound/sign-meaning)Basic units of morphology
Morphemes are the “building stones” of phrases
Antske Fokkens Morphology 6 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Why study morphology? (1/2)
One of the main properties of language are thesound/meaning pairs
When analyzing language (or learning a foreign language),we can’t simply list all expressions: there is an infinitenumber of them!
So we compose expressions into smaller units: usually intophrases and words (syntax)
Antske Fokkens Morphology 7 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Why study morphology? (2/2)
Can we use words as basic sound/meaning units?Problems:
1 Definition of words is unclear2 Words can be composed of many components thatcontribute to meaning and/or grammar
Several applications in Computational Linguistics benefit frommorphological analysis (more later)
Antske Fokkens Morphology 8 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Words and Morphemes
There are two main usages of the term word :
1 Surface form (spoken or written representation)
2 Abstract form (lemma or dictionary entry,e.g. bare infinitives in English, nominative single form ofnouns in Latin)
The class of forms representing a word in different contextsis called a lexemee.g. sing = {sing, sings, sang, sung, singing}
Based on Crysmann 2006
Antske Fokkens Morphology 9 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
A definition of words?
Words can be described as units of language (eithersequences of sounds, or signs) that function as meaningbearers. But this is a fuzzy notion, e.g.:
talked in she talked expresses both “talking” and pasttense.
Is more or less one word, or are there three words?
A structuralist solution: morphemes
Antske Fokkens Morphology 10 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
A language:
11-112 phonemes
↓
4,000-10,000 morphemes
↓
An infinite number of sentences
Antske Fokkens Morphology 11 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Morphs and Morphological Analysis
The realisations of morphemes are called morphs:
e.g. English plural morpheme:[NUMBER pl]: -s, -es, -en, -∅boy-s, box-es, ox-en, sheepThese different realisations of the same morpheme arecalled allomorphs.
Morphological analysis
Segmentation of expressions into basic units (mostlystarting from word-level).Classification of these basic units according to function.
Based on Crysmann 2006
Antske Fokkens Morphology 12 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Types of morphemes
Free MorphemesFree morphemes can occur independently. Freemorphemes are common in both English and German.
e.g. boy, sing
Bound MorphemesBound morphemes must be attached to anothermorpheme, and cannot be used independently.
e.g. [NUMBER pl] -s → boys
Based on Crysmann 2006
Antske Fokkens Morphology 13 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Types of bound morphemes
Typical bound morphemes are:
affixes (boy+s, talk+ed )
clitics (French: je ne sais pas, je and ne cannot occurwithout a verb)
roots (Spanish habl- needs an ending indicating person,number, mode, etc.)
Antske Fokkens Morphology 14 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Formatives
Morphemes are form-meaning pairs, but not all segmentalforms have an identifiable meaning:
Formatives are forms without identifiable meaning
e.g. Linking elements in German compounds:Geburt+s+tag (Birthday), Schwan+en+hals (swan neck).
Based on Crysmann 2006
Antske Fokkens Morphology 15 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Introduction
What are morphemes?
Pseudo Morphemes
Pseudo-morphemes or cranberry morphemes arespecial cases of formatives.They are segment-able part of a complex word, but do nothave an independent meaning:
e.g.
cran+berry, rasp+berryre+ceive, con+ceive
Based on Crysmann 2006
Antske Fokkens Morphology 16 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Outline
1 Introduction to MorphologyIntroductionWhat are morphemes?
2 Subdomains of Morphology
3 Properties of MorphemesMorphemes and their shapesMorphological Processes
4 Morphology in Computational LinguisticsAutomataFinite State Transducers
Antske Fokkens Morphology 17 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Areas of Morphology
We distinguish:
Word forming:
Derivational morphologyCompounding
Inflection
Antske Fokkens Morphology 18 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Derivational Morphology
allows to build complex words by combining bound andfree morphemes.
Derivational operations are per definition optional, i.e. notrequired by syntactic criteria.
Meaning or, at least, the general concept is (generally) notchanged, though when, who or what and sometimeswhere, how and whether may be specified by inflectionalmorphemes.
There are bound and free inflectional morphemes:go [TENSE past]: wentgo [TENSE future]: will go
Based on Crysmann 2006
Antske Fokkens Morphology 23 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Inflection — paradigm
Inflectional morphology is typically organised in paradigms.
Paradigm
“A set of forms having the same root/stem, one of which mustbe selected in a certain syntactic environment” (definitionbased on [Crystal(1997)] (p. 277) and [Payne(1997)] (p. 26))
1 Introduction to MorphologyIntroductionWhat are morphemes?
2 Subdomains of Morphology
3 Properties of MorphemesMorphemes and their shapesMorphological Processes
4 Morphology in Computational LinguisticsAutomataFinite State Transducers
Antske Fokkens Morphology 26 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Some Basic Notions
Root: an unanalysable form, expressing the basic lexicalcontent of a word. Also defined as ’what is left of acomplex form when all affixes are stripped’.
Stem: consists of at least a root.It can contain (an) derivational affix(es).In inflectional morphology, stem is generally defined as theroot + a thematic vowel.
Base: a form to which an affix may be added. A base maybe simplex (root) or complex (root + affixes).
Suppletion refers to ’stem replacement’: a verb has morethan one stem which are used in different contexts.
In many European languages, suppletion occurs with theverb ’to be’, e.g. in English, the verb uses three historicallydifferent roots:
am, are, iswas, werebe
(Payne, 1997)
Antske Fokkens Morphology 37 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Subtractive Morphology (1/2)
Subtractive morphology means that part of the stem isomitted to mark a morphological process.
For instance Koasati (a Muskogean language, spoken inthe US):
Singular Plural Gloss
pitaf-fi-n pit-li-n to slice up the middlelasap-li-n las-li-n to lick somethingacokcana:-kaln acokcan-ka-n to quarrel with someoneobakhitip-li-in obakhit-li-n to go backwards
Data taken from Sproat (1992)
Antske Fokkens Morphology 38 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Subtractive Morphology (2/2)
The shape of the base cannot be predicted from thederived form
Subtractive Morphology is problematic for theoriesassuming that morphology consists of the addition ofmorphemes
Based on Crysmann 2006
Antske Fokkens Morphology 39 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Reduplication
Reduplicated morphemes are formed by reduplicating(part of) the base.
In total reduplication the entire base is copied, thoughminor changes may occur, e.g. ([Kiparsky(1987)] (p.115-117)
Indonesian:orang orang orang’man’ ’men’
Javanese:Base Habitual-Repetitive Gloss
bali bola bali ’return’udan udan udεn ’rain’
Based on Crysmann 2006
Antske Fokkens Morphology 40 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Suprasegmental Marking
StressEnglish verb-noun derivations:
Verb Noun
produce producepermit permitimport importinsult insultdiscount discount
[Chomsky and Halle(1968)] propose phonological rules toderive “surface” morphemes in The Sound Pattern ofEnglish (SPE)
They were formalized as (ordered) context-sensitiverewrite rules:
a→ b/v_we.g. iN-→ im-/_m
Antske Fokkens Morphology 44 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
(Morpho)phonological rules
There was a strong believe that related morphemes are allderived from the same underlying representation, even ifthis form never occurs on the surface (e.g. divine anddivinity would come from the root divIn)
The approach did not take general phonetic constraintswithin the language in account, nor did it address rules andtendencies in morpheme structures
Antske Fokkens Morphology 45 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Declination of puella
Latin declination of a noun of the first declination:
syncretism: the same form is used to express differentfeature combinations.e.g. in the declination of puella:
-ae: GEN or DAT singular, or NOM plural-a: NOM or ABL singular-is: DAT or ABL plural
exponence: the relation between form and function ism:n:
multi-exponence (cumulation): one form expressesseveral functions.Here: -am expresses both accusative and singularExtended exponence: in ge-dehn-t, ge- and -t expressone function together.
Antske Fokkens Morphology 47 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Morphological Properties — Synthesis
Synthesis: the number of morphemes that tend to occur withina word.
In isolating languages words tend to consist of only onemorpheme. (e.g. Chinese languages)
Polysynthetic languages are known for the large numberof morphemes that may occur in a single word. Forinstance, the Quechua and Inuit languages. The followingexample is from Yup’ik:
’He had not yet said again that he was going to hunt reindeer’
([Payne(1997)], p. 28)
Antske Fokkens Morphology 48 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Morphological Properties — Fusion (1/2)
Fusion: the number of meaning units that are found in onemorphological shape:
Agglutinative languages have little fusion: each meaningcomponent is represented by its own morpheme (e.g.Turkish).
Fusional languages have morphemes that express manymeaning units: e.g. -ó in Spanish habló expressesindicative mode, 3rd person, singular, past tense andperfect aspect.
Antske Fokkens Morphology 49 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Morphemes and their shapes
Morphological Processes
Morphological Properties — Fusion (2/2)
In English, both examples of agglutinative morphemes, andfusional ones can be found:
agglutinative: anti+dis+establish+ment+arian+ism
fusion: vowel change in plural forming (goose/geese) andstrong verbs (sing/sang).Individual morphemes (root and number/tense) cannot besegmented in chunks, therefore these forms are fusional.
Antske Fokkens Morphology 50 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Outline
1 Introduction to MorphologyIntroductionWhat are morphemes?
2 Subdomains of Morphology
3 Properties of MorphemesMorphemes and their shapesMorphological Processes
4 Morphology in Computational LinguisticsAutomataFinite State Transducers
Antske Fokkens Morphology 51 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Morphology in Computational Linguistics
Morphology related applications in computational linguisticsare:
1 Analysing complex words, defining their component parts:
anti+dis+establish+ment+arian+ism
2 Analysis of grammatical information, encoded in words:
singssing[PERSON 3, NUMBER singular,TENSE present]
Antske Fokkens Morphology 52 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Morphological Processing
Inflection
lemmatisation/stemmingextraction of grammatical (morpho-syntactic) features(preprocessing for parsing)State of the art: finite state technology (to be discussed)
Reduction of lexicon size (English 2:1, German 5:1,Finnish/Turkish >200:1) (Crysmann 2006)
Antske Fokkens Morphology 53 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Morphological Processing (cont)
Derivational MorphologySemi-productivity is still a challenge
Rule-based approaches tend to suffer from over-generation
Compound Analysis
Important for languages with productive compoundingAdditional task: bracketing
Antske Fokkens Morphology 54 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Why do we need morphology?
For linguistic tools, such as parsers:significant reduction of lexicon size
For statistical methods:reduces unseen data: in a morphologically rich language,many words will be found in each possible form, even in alarge training corpus.
Machine translation runs into problems, in particular when translatingfrom a morphologically poor to a morphologically rich language. Thisis expected to become a ’hot topic’ in MT
State of the art: Finite State Transducers
Antske Fokkens Morphology 55 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Non-deterministic Finite Automata (NFA)
Definition
A non-deterministic finite automaton is a quintuple (Q, Σ,δ, q0 , F ), where
Q is a finite set of statesΣ is a finite set of symbolsδ is a transition function delta : Q × Σ → Q,such that for each qi ∈ Q and each σ ∈ Σ, there is a qjsuch that δ(qi , σ) = qj , where qj is a non-final sink state,unless σ is licit at state qiq0 ∈ Q is a unique initial stateF ⊆ Q is a set of final states
At worse, a NFA’s complexity is exponential at word length
Based on Crysmann 2006
Antske Fokkens Morphology 56 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
εFailure
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
εBacktracking
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
εFailure
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
εBacktracking
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
ε
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
An example of a NFA
German adjectives
klein+ er+ es
1 2 3 4er e n
ε
st
m
r
s
ε
εAccepted!
Based on Crysmann 2006
Antske Fokkens Morphology 57 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Deterministic Finite Automata (DFA)
So what about the worse case exponential complexity of NFA?
Deterministic Finite Automata (DFA) are linear at worse case
For each NFA, there is always an equivalent DFA (Hopcroft andUllman 1979)
DFA, Definition
A deterministic finite automaton is a quintuple (Q, Σ, δ, q0 , F ),where
Q is a finite set of statesΣ is a finite set of symbolsδ is a transition function δ : Q × Σ → Q,q0 ∈ Q is a unique initial stateF ⊆ Q is a set of final states
Antske Fokkens Morphology 58 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
From NFA to DFA
For each Nondeterminstic finite state machine, there is anequivalent deterministic finite state machine
Step to take:
1 Expand edges that take more than one input character
2 Eliminate ε-edges (by adding alternative edges)
3 Construct power automaton (recursively combine statesreached by the same input symbol)
Antske Fokkens Morphology 59 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Expanding multiple symbol edges
q0start q1 q2 q3ε
er
st
e
ε
s
m
n
rε
q0start
q1a
q1b
q1 q2 q3ε
e
s
r
t
e
ε
s
m
n
rε
Based on Crysmann 2006
Antske Fokkens Morphology 60 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Eliminating ε-edges
q0start q1a
q1b
q1 q2 q3
ε
e
s
r
t
e
ε
smn
rε
q0start q1a
q1b
q1 q2 q3
ε
e
s
e
r
t
e
ε
smn
rε
Antske Fokkens Morphology 61 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Elimination of ε edges
q0start q1a
q1b
q1 q2 q3e
s
e
r
t
e
ε
s
m
n
r
ε
q0start q1a
q1b
q1 q2 q3e
s
e
r
r
t
t
e
ε
s
m
n
r
ε
Based on Crysmann 2006
Antske Fokkens Morphology 62 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Elimination of ε edges
q0start q1a
q1b
q1 q2 q3e
s
e
r
r
t
t
e
s
m
n
r
ε
q0start q1a
q1b
q1 q2 q3e
s
e
e
r
r
t
t
e
e
s
m
n
r
ε
Based on Crysmann 2006
Antske Fokkens Morphology 63 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Constructing a power automaton
q0start q1a
q1b
q1 q2 q3e
s
e
e
r
r
t
t
e
e
s
m
n
r
{q0 }start
{q1a,q2 ,q3 }
{q1b}
{q1,q3 } {q2 ,q3 } {q3 }
e
s
r
m,s,n
t
e m,s,n,r
Based on Crysmann 2006
Antske Fokkens Morphology 64 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Finite State Transducers
Finite State Transducers are variants of Finite StateMachines that accepts language over symbol pairs(a:a,a:c) instead of single symbols
Conventionally, left hand symbols correspond to lexiconinput, and right-hand symbols to the surface string
The ∅ can appear both on input string and output string,the symbol “=” (or @) stands for the ’any’ symbol
FSTs can be used to implement phonological rules([Johnson(1972)])
Based on Crysmann 2006
Antske Fokkens Morphology 65 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
A Finite State Transducer
y + s → ies
q0start q1
q2 q3
y:i
=:= ∅:e
∅:s
Based on Crysmann 2006
Antske Fokkens Morphology 66 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Summary
Morphemes are minimal sign/meaning pairs
Morphological analysis plays a role in reduction of lexiconsize, unknown word recognition, etc
Several meaning units can be mapped in one morpheme(multi-exponence)
Phenomena such as reduplication, syncretism,allomorphism, and morphophonological processes makethat morphemes are not necessarily easily recognizable
FSM forms the standard (basic) technique formorphological analysis
Antske Fokkens Morphology 67 / 69
uds-logo
Introduction to Morphology
Subdomains of Morphology
Properties of Morphemes
Morphology in Computational Linguistics
Automata
Finite State Transducers
Bibliography I
Chomsky, Noam and Halle, Morris. 1968. The Sound Pattern ofEnglish.New York, USA: Harper and Row.
Crysmann, Berthold. 2006. Foundations of Language Science andTechnology: Morphology.http://www.coli.uni-saarland.de/~hansu/courses/FLST05/schedule.html.Accessed on the 14th of August 2008.
Crystal, David. 1997. The Cambridge Encyclopedia of Language.Cambridge, UK: Cambridge University Press.
Johnson, C. Douglas. 1972. Formal Aspects of PhonologicalDescription. The Hague, NL: Mouton.
Kiparsky, Paul. 1987. The Phonology of Reduplication.
Payne, Thomas E. 1997. Morphosyntax – a guide for field linguists.Cambridge, UK: Cambridge University Press.
Sproat, Richard. 1992. Morphology and Computation. Cambridge,USA. MIT Press.