Top Banner
Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical rules of a language represented? How do we represent the parts of speech that go with particular grammatical rules? Questions to Address
62

Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Dec 14, 2015

Download

Documents

Rolando Timbs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Dictionaries and Grammar

• Do we include all forms of a particular word, or do we include only the base word and derive its forms?

• How are the grammatical rules of a language represented?

• How do we represent the parts of speech that go with particular grammatical rules?

Questions to Address

Page 2: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Morphology

• Morphology – The study of the patterns used to form words– E.g. inflection, derivation, and compounds

• Morpheme - Minimal meaning-bearing unit– Could be a stem or an affix

• Stem {“unthinkable” “realization” “distrust”}– The part of a word that contains the root meaning (E.g. cat)

• Affixes {-s, un-, de-, -en, -able, -ize, -hood}– a linguistic element added to a word modify the meaning– E.g.: prefix (unbuckle), suffix (buckled), infix

(absobloodylutely), and circumfix (gesagt in German for said).– Affixes can attach to other affixes (boyishness)

Definitions

Page 3: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Knowing Words• When we know a word, we know its

1. Phonological sound sequences2. Semantic meanings3. Morphological relationships4. Syntactic categories and proper structure of a sentence

• Morphological relationships adjust word meanings– Person Jill waits.– Number Jill carried two buckets.– Case The chair’s leg is broken.– Tense Jill is waiting there now.– Degree Jill ran faster than Jack.– Gender Jill is female– Part of Speech Jill is a proper noun

These are the kind of things we want our computers to figure out

Page 4: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Units of Meaning

• How many morphemes do each of the following sentences have?– “I have two cats”– “She wants to leave soon”– “He walked across the room”– “Her behavior was unbelievable”

• Free Morphemes {eye, think, run, apple}• Bound Morphemes {-able, un-, -s, -tion, -ly}

Page 5: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Affix Examples

• Prefixes from Karuk, a Hokan language of California [pasip] “Shoot!” [nipasip] “I shoot” [/upasip] “She/he shoots”

• Suffixes from Mende spoken in Liberia and Sierra Leone [pElE] “house” [pElEi] “the house” [mEmE] “glass” [mEmEi] “the glass”

• Infixes from Bontoc spoken in the Phillipines [fikas] “strong” [fumikas] “she is becoming strong” [fusul] “enemy” [fumusal] “she is becoming an enemy”

Page 6: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Turkish Morpology

Uygarlastiramadiklarimizdanmissinizcasina

Meaning: `behaving as if you are among those whom we could not civilize’

• Uygar `civilized’ + las `become’ + tir `cause’ • + ama `not able’ + dik `past’ • + lar ‘plural’+ imiz ‘p1pl’ + dan ‘abl’ • + mis ‘past’ + siniz ‘2pl’ + casina ‘as if’

Page 7: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

How does the Mind Store Meanings?• Hypotheses

– Full listing: We store all words individually – Minimum redundancy: We store morphemes and how they relate

• Analysis– Determine if people understand new words based on root meanings– Observe whether children have difficulty learning exceptions– Regular form: government/govern, Irregular form: department/depart

• Evidence suggests – The mind represents words and affix meanings separately– Linguists observe that affixes were originally separate words that

speakers slur together over time

Page 8: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

General Observations about Lexicons

• Meanings are continually changing

• Roots and Morphemes do not have to occur in a fixed position in relation to other elements.

• How many words do people know?– Shakespeare uses 15,000 words– A typical high school student knows 60,000

(learning 10 words a day from 12 months to 18 years)

• How many English words are there?– Over 300,000 words without Morphemes in 1988

Page 9: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Computational Morphology

• Consider all of the morphemes of the word ‘true’– true, truer, truest, truly, untrue, truth, truthful, truthfully, untruthfully,

untruthfulness– Untruthfulness = un- + true + -th + -ful + -ness

• Productive morphemes– An affix that at a point in time spread rapidly through the language– Consider goose and geese versus cat and cats

• The former was an older way to indicate plurals• The latter is a more recent way that spread throughout

• If we store morpheme rules, not all words, we can – Reduce storage requirements and simplify creating entire dictionaries– More closely mimic how the mind does it– Be able to automatically understand newly encountered word forms

Speech recognition requires a language dictionaryHow many words would it contain?

Page 10: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Morphology Rules• There are rules used to form complex words from their roots

– ‘re-’ only precedes verbs (rerun, release, return)– ‘-s’ indicates plurals– ‘-ed’ indicates past tense

• Affix Rules– Regular: follow productive affix rules– Irregular: don’t follow productive affix rules

• Nouns– Regular: (cat, thrush), (cats, thrushes), (cat’s thrushes’)– Irregular: (mouse, ox), (mice, oxen)

Observation: More frequent words resist changes that result fromproductive affixes and take irregular forms (E.g. am, is, are).

Exceptions: A singer sings, and a writer writes. Why doesn’t a whisker whisk, a spider spid, or a finger fing?

Page 11: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Parsing

• Morphological parsing – Identifies stem and affixes and how they relate– Example:

• fish fish + Noun + Singular or goose + Verb• fish fish +Noun +Plural• fish fish +Verb +Singular

– Bracketing: indecipherable [in [[de [cipher]] able]]• Why do we parse?

– spell-checking: Is muncheble a real word?– Identify a word’s part-of-speech (pos)– Sentence parsing and machine translation– Identify word stems for data mining search operations– Speech recognition and text to speech

Identify components and underlying structure

Page 12: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Parsing Applications

• Lexicon– Create a word list– Include both stems and affixes (with the part of speech)

• Morphotactics – Models how morphemes can be affixed to a stem. – E.g., plural morpheme follows noun in English

• Orthographic rules– Defines spelling modifications during affixation– E.g. true tru in context of true truthfully

Page 13: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Grammatical Morphemes

• New forms are rarely added to closed morpheme classes

• Examples– prepositions at, for, by– articles a, the– conjunctions and, but, or

Page 14: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Morphological Parsing (stemming)

• Goal: Break the surface input into morphemes

• foxes – Fox is a noun stem– It has -es as a plural suffix

• rewrites – Write is the verb stem– It has re- as a prefix meaning to do again– It has a –s suffix indicating a continuing activity

Page 15: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Inflectional Morphology

• Nouns– plural marker: -s (dog + s = dogs)– possessive marker: -’s (dog + ’s = dog’s)

• Verbs– 3rd person present singular: -s (walk + s = walks)– past tense: -ed (walk + ed = walked)– progressive: -ing (walk + ing = walking)– past participle: -en or -ed (eat + en = eaten)

• Adjectives– comparative: -er (fast + er = faster)– superlative: -est (fast + est = fastest)

• In English– Meaning transformations are predictable– All inflectional affixes are suffices– Inflectional affixes are attached after any derivational (next slide) affixes

• E.g. modern + ize + s = modernizes; not modern + s + ize

Does not change the grammatical category

Page 16: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Concatenative and Non-concatenative • Concatenative morphology combines by concatentation

– prefixes and suffixes• Non-concatentative morphology combines in complex ways

– circumfixes and infixes– templatic morphology

• words change by internal changes to the root• E.g. (Arabic, Hebrew) ktb (write), kuttib (will have been written)

C V C C V C

k t b

u i

kuttib

Templative Example

Page 17: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Verbal Inflective Morphology• Verbal inflection

– Main verbs (sleep, like, fear) are relatively regularStandard morphemes: -s, ing, ed These morphemes are productive: Emails, Emailing, Emailed

– Combination with nouns for syntactical agreement I am, we are, they were

• There are exceptions– Eat (will eat, eats, eating, ate) – Catch (will catch, catches, catching, caught)– Be (will be, is, being, was)– Have (will have, has, having, had)

• General Observations about English– There are approximately 250 Irregular verbs that occur– Other languages have more complex verbal inflection rules

Page 18: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Nominal Inflective Morphology

• Plural forms (s or es)• Possessives (cat’s or cats’)• Regular Nouns

– Singular (cat, bush)– Plural (cats, bushes)– Possessive (cat’s bushes’)

• Irregular Nouns– Singular (mouse, ox)– Plural (mice, oxen)

Page 19: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Derivational Morphology

• Word stem combines with grammatical morpheme– Usually produces word of different class– Complex rules that are less productive with many exceptions– Sometimes meanings of derived terms are hard to predict (E.g. hapless)

• Examples: verbs to nouns– generalize, realize generalization, realization– Murder, spell murderer, speller

• Examples: verbs and nouns to adjectives– embrace, pity embraceable, pitiable– care, wit careless, witless

• Example: adjectives adverbs– happy happily

• More complicated to model than inflection– Less productive: science-less, concern-less, go-able, sleep-able

Page 20: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Derivational Morphology Examples

Level 1• Examples: ize, ization,

ity, ic, al, ity, ion, y, ate, ous, ive, ation

• Observations– Can attach to non-words

(e.g. fratern-al, paternal)– Often changes stem’s

stress and vowel quality

Level 2• Examples: hood, ness,

ly, s, ing, ish, ful, ly, less, y (adj.)

• Observations– Never precede Level 1

suffixes– Never change stress or

vowel quality– Almost always attach to

words that exists

Level 1 + Level 1: histor-ic-al, illumina-at-tion, indetermin-at-y;Level 1 + Level 2: fratern-al-ly, transform-ate-ion-less;Level 2 + Level 2: weight-less-nessBig one: antidisestablishmenterrianism (if I spelled it right)

Page 21: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Adjective Morphology• Standard Forms

– Big, bigger, biggest– Cool, cooler, coolest, cooly– Red, redder, reddest– Clear, clearer, clearest, clearly, unclear, unclearly– Happy, happier, happiest, happily– Unhappy, unhappier, unhappiest, unhappily– Real, unreal, really

• Exceptions: unbig, redly, realest

Page 22: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Identify and Classify Morphemes• In each group

– Two words have a different morphological structure– One word has a different type of suffix– One word has no suffix at all

• Perform the following tasks– 1.Isolate the suffix that two of the words share.– 2.Identify whether it is (i) free or bound; (ii) prefix, infix, suffix;

(iii) inflectional or derivational.– 3.Give its function/meaning.– 4.Identify the word that has no suffix– 5.Identify the word that has a suffix which is different from the

others in each group.

a. b. c. d.rider tresses running tablescolder melodies foundling lenssilver Bess’s handling witchesactor guess fling calculates

Page 23: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Computational Techniques

• Regular Grammars

• Finite State Automata

• Finite State Transducer

• Parsing – Top down and bottom up

Page 24: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Regular Grammars

• Grammar: Rules that define legal characters strings

• A regular grammar accepts regular expressions

• A regular expression must satisfy the following:– The grammar with no strings is regular – The grammar that accepts the empty string is regular– A single character is a regular grammar– If r1 and r2 are regular grammars, then r1 union r2, and r1

concatenated with r2 are regular grammars– If r is a regular grammar, then r* ( where * means zero or more

occurrences) is regular

Page 25: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Notations to Express Regular Expressions

• Conjunction: abc• Disjunction: [a-zA-Z], gupp(y|ies)• Counters: a*, a+, ?, a{5}, a{5,8}, a{5,}• Any character: a.b• Not: [^0-9]• Anchors: /^The dog\.$/

– Note: the backslash before the period is an escape character– Other escape characters include \*, \?, \n, \t, \\, \[, \], etc.

• Operators– \d equivalent to [0-9], \D equivalent to [^0-9]– \w equivalent to [a-zA-z0-9 ], \W equivalent to [^\w]– \s equivalent to [ \r\t\n\f], \S equivalent to [^s]

• Substitute one regular expression for another: s/regExp1/regExp2/

Page 26: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Examples of Regular Expressions

• All strings ending with two zeroes• All strings containing three consecutive zeroes• All strings that every block of five consecutive

symbols have at least two zeroes• All strings that the tenth symbol from the right is a

one • The set of all modular five numbers

Page 27: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Finite State Automata (FSA)

Definition: A FSA consists of1. a set of states (Σ)

2. a starting state (q0)

3. a set of final or accepting states (F Q)

4. a finite set of symbols (Q)

5. a transition function ((q,i) ) that maps QxΣ to Q. It switches from a from-state to a to-state, based on one of the valid symbols

Synonyms: Finite Automata, Finite State Machine

FSA’s recognize grammars that are regular

Page 28: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Recognition

• Traditionally, Turing used a tape reader to depict a FSA• Algorithm

– Begin in the start state– Examine the current input character– Consult the table– Go to a new state and update the tape pointer.– Until you run out of tape.– The machine accepts the string processing stops in a final state

Determine if the machine accepts a particular stringi.e. Is a string in the language?

Page 29: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Graphs and State Transition Tables• What can we can say about this machine?

– It has 5 states– At least b,a, and ! are in its alphabet– q0 is the start state– q4 is an accept state– It has 5 transitions

• Questions– Which strings does it accept? baaaa, aaabaaa, ba– Is this the only FSA that can accept this language?

An FSA only can accept regular strings. Question: Can you think of a string that is not regular?

State Transition Table

Annotated Directed Graph

Page 30: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Recognizer Implementation

index = beginning of tapestate = start stateDO IF transition[index, tape[index]] is empty RETURN false state = transition[index, tape[index]] index = index + 1UNTIL end of tap is reachedIF state is a final state RETURN trueELSE RETURN false

Page 31: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Key Points Regarding FSAs

• This algorithm is a state-space search algorithm– Implementation uses simple table lookups– Success occurs when at the end of a string, we reach a final state

• The results are always deterministic– There is one unique choice at each step– The algorithm recognizes all regular languages

• Perl, Java, etc. use a regular expression algorithm– Create a state transition table from the expression– pass the table to the FSA interpreter

• FSA algorithms– Recognizer: determines if a string is in the language– Generator: Generates all strings in the language

Page 32: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Non-Deterministic FSA• Deterministic: Given a state and symbol, only one transition is possible• Nondeterministic:

– Given a state and a symbol, multiple transitions are possible– Epsilon transitions: those which DO NOT examine or advance the tape

• The Nondeterministic FSA recognizes a string if:– At least one transition sequence ends at a final state– Note: all sequences DO NOT have to end at a final state– Note: String rejection occurs only when NO sequence ends at a final state

ε

Examples

Page 33: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Concatenation

Page 34: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Closure Closure

Page 35: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Union

Page 36: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Using NFSAsInput

State b a ! e

0 1 0 0 0

1 0 2 0 0

2 0 2,3 0 0

3 0 0 4 0

4 0 0 0 0

Page 37: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

NFSA Recognition of “baaa!”

Page 38: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Breadth-first Recognition of “baaa!”

Page 39: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Nondeterministic FSA Example

b a a a ! \

q0 q1 q2 q2 q3 q4

Page 40: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Other FSA ExamplesDollars and Cents

Exercise: Create a FSA for the following regular expressions(0|1)*[a-f1-9]abc{5}

Page 41: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Non Deterministic FSA RecognizerRecognizer (index, state)

LOOP IF end of tape THEN IF state is final RETURN true

ELSE RETURN false

IF no possible transitions RETURN false IF there is only one transition state = transition[index, tape[index]] IF not an epsilon transition THEN index++ ELSE FOR each possible transition not considered result = CALL recognizer(nextState,nextIndex) IF result = true RETURN trueEND LOOPRETURN false

Page 42: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

FSA’s and Morphology• Apply an FSA to each word in the dictionary to capture the

morphological forms.• Groups of words with common morphology can share FSAs

Page 43: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Building a Lexicon with a FSA

Page 44: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Derivational Rules

Page 45: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Simple Morphology Example

)()(

ly

est

er

rootsadjun

q0 q1 q2 q3

un- adj-root -er –est -ly

e From To Output

0 1 un

0 1 NULL

1 2 adj-root-list

2 3 er;est;lyStop states: q2 and q3

Page 46: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

An Extended ExampleFrom To Output

0 1 un

0 3 NULL

1 2 adj-root-list-1

2 5 er;est;ly

3 2 adj-root-list-1

3 4 adj-root-list-2

4 5 er;est

q0q1 q2

q5

un- -er –est -ly

eq4 -er –estq3

adj-root-2

adj-root-1

adj-root-1

)()( 1

ly

est

er

rootsadjun )(2

est

errootsadj

Adj-root1: clear, happy, realAdj-root2: big, red

Page 47: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Representing Derivational Rules

ation ize

noun

adj

verb

adverb

er

nouns

ativeive

able

lyly

ity, ness

ful

verbs

adjectives

adverbs

Page 48: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Finite State Transducer (FST)• Definition: A FST is a 5-tuple consisting of

– Q: set of states {q0,q1,q2,q3,q4}– : an alphabet of complex symbols

• Each complex symbol contains two simple symbols• The first symbol is from an input alphabet i I• The second symbol is from an output alphabet o O• is in I x O, ε is the null character

– q0: a start state– F: a set of final states in Q {q4}– (q,i:o): a transition function mapping Q x to Q

• Concept: Translates and writes to a second tape

a:o

q0 q4q1 q2 q3

b:m a:o a:o !:?

Example: baaaamoooo

Page 49: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Transition Example

• c:c means read a c on one tape and write a c on the other

• +N:ε means read a +N symbol on one tape and write nothing on the other

• +PL:s means read +PL and write an s

c:c a:a t:t +N:ε +PL:s

Page 50: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

On-line demos

• Finite state automata demoshttp://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fsinput.html

• Finite state morphologyhttp://www.xrce.xerox.com/competencies/content-analysis/demos/english

• Some other downloadable FSA tools:http://www.research.att.com/sw/tools/fsm/

Page 51: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Lexicon for L0

Rule based languages

Page 52: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Top Down Parsing

S

NP VP

NP

Nom

NounNounDetVerbPro

flightmorningapreferI

[S [NP [Pro I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight]]]]]

S → NP VP, NP→Pro, Pro→I, VP→V NP, V→prefer, NP→Det Nom, Det→a, Nom→Noun Nom, Noun→morning, Noun→flight

Driven by the grammar, working down

Page 53: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Bottom Up ParsingDriven by the words, working up

The Grammar

0) S E $ 1)E E + T | E - T | T 2)T T * F | T / F | F 3) F num | id

The Bottom Up Parse

1)id - num * id2)F - num * id3)T - num * id4)E - num * id5)E - F * id 6)E - T * id7)E - T * F8)E - T9)E10)S correct sentence

Note: If there is no rule that applies, backtracking is necessary

Page 54: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Top-Down and Bottom-Up

• Top-down– Advantage: Searches only trees that are legal– Disadvantage: Tries trees that don’t match the words

• Bottom-up– Advantage: Only forms trees matching the words– Disadvantage: Tries trees that make no sense globally

• Efficient combined algorithms – Link top-down expectations with bottom-up data– Example: Top-down parsing with bottom-up filtering

Page 55: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Stochastic Language Models

• Problems– A Language model cannot cover all grammatical rules– Spoken language is often ungrammatical

• Solution– Constrain search space emphasizing likely word sequences– Enhance the grammar to recognize intended sentences even

when the sequence doesn't satisfy the rules

A probabilistic view of language modeling

Page 56: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Probabilistic Context-Free Grammars (PCFG)

• Definition: G = (VN, VT, S, P, p);

VN = non-terminal set of symbols

VT = terminal set of symbols

S = start symbol

p = set of rule probabilities

R = set of rules

P(S ->W |G): S is the start symbol, W = expression in grammar G

• Training the Grammar: Count rule occurrences in a training corpusP(R | G) = Count(R) / ∑C(R)

Goal: Assist in discriminating among competing choices

Page 57: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

PFSA (Probabilistic Finite State Automata)

• A PFSA is a type of Probabilistic Context Free Grammar– The states are the non-terminals in a production rule– The output symbols are the observed outputs– The arcs represent a context-free rule– The path through the automata represent a parse tree

• A PCFG considers state transitions and the transition path

S1

a S2

b S3

ε

S1 S2 S3

a b

Page 58: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Probabilistic Finite State Machines

• Probabilistic models determine weights of the transitions

• The sum of weights leaving a state total to unity

• Operations– Consider the weights to

compute the probability of a given string or most likely path.

– The machine can ‘learn’ the weights over time

Canine

Companion

Tooth

.01

.0035

.001

Page 59: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Another Example

Page 60: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Pronunciation decoding

[n iy]

Page 61: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Merging the machines together[n iy]

Page 62: Dictionaries and Grammar Do we include all forms of a particular word, or do we include only the base word and derive its forms? How are the grammatical.

Another Example