Top Banner
the quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter SpeechWorks International
36

The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

Jan 15, 2016

Download

Documents

Xzavier Trumble
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

the quest for robustness, scalability, and portability in (spoken) language applications

Linguistics Methodology meets Language Reality:

Bob CarpenterSpeechWorks International

Page 2: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

2

The Standard Cliché(s)

• Moore’s Cliché: – Exponential growth in computing power and memory will

continue to open up new possibilities

• The Internet Cliché:– With the advent and growth of the world-wide web, an

ever increasing amount of information must be managed

Page 3: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

3

More Standard Clichés

• The Convergence Cliché:– Data, voice and video networking will be integrated over

a universal network, that:• includes land lines and wireless; • includes broadband and narrowband• likely implementation is IP (internet protocol)

• The Interface Cliché: – The three forces above (growth in computing power,

information online, and networking) will both enable and require new interfaces

– Speech will become as common as graphics

Page 4: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

4

Some Comp Ling Clichés

• The Standard Linguist’s Cliché– But it must be recognized that the notion “probability of a

sentence” is an entirely useless one, under any known interpretation of this term.

– Noam Chomsky, 1969 [essay on Quine]

• The Standard Engineer’s Cliché– Anytime a linguist leaves the group the recognition rate

goes up. – Fred Jelinek, 1988 [address to DARPA]

Page 5: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

5

The “Theoretical Abstraction”

• mature, monolingual, native language speaker– idealized to complete knowledge of language

• static, homogenous language community– all speakers learn identical grammars

• “competence” (vs. “performance”)– “performance” is a natural class– wetware “implementation” follows theory in divorcing

“knowledge of language” from processing

• assumes the existence and innateness of a “language faculty”

Page 6: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

6

The Explicit Methodology

• “Emprical” Basis is binary grammaticality judgements– “intuitive” (to a “properly” trained linguist)– innateness and the “language faculty”– appropriate for phonetics through dialogue– in practice, very little agreement at boundaries and no

standard evaluations of theories vs. data

• Models of particular languages– by grammars that generate formal languages– low priority for transformationalists– high priority for monostratalists/computationalists

Page 7: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

7

The Holy Grail of Linguistics

• A grammar meta-formalism in which – all and only natural language grammars (idealized as

above) can be expressed– assumed to correspond to the “language faculty”

• Grail is sought by every major camp of linguist– Explains why all major linguistic theories look alike from

any perspective outside of a linguistics department– The expedient abstractions have become an end in

themselves

Page 8: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

8

But, Applications Require

• Robustness– acoustic and linguistic variation– disfluencies and noise

• Scalability– from embedded devices to palmtops to clients to servers– across tasks from simple to complex– system-initiative form-filling to mixed initiative dialogue

• Portability– simple adaptation to new tasks and new domains– preferably automated as much as possible

Page 9: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

9

The $64,000 Question

• How do humans handle unrestricted language so effortlessly in real time?

• Unfortunately, the “classical” linguistic assumptions and methodology completely ignore this issue

• Psycholinguistics has uncovered some baselines:– lexicon (and syntax?): highly parallel– time course of processing: totally online– information integration: <= 200ms for all sources

• But is short on explanations

Page 10: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

10

(AI) Success by Stupidity

• Jaime Carbonell’s Argument (ECAI, mid 1990s)• Apparent “intelligence” because they’re too limited

to do anything wrong: “right” answer hardcoded

• Typical in Computational NL Grammars– lexicon limited to demo– rules limited to common ones (eg: no heavy shift)

• Scaling up usually destroys this limited “success”– 1,000,000s of “grammatical” readings with large

grammars

Page 11: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

11

My Favorite Experiments: I

• Mike Tanenhaus et al. (Univ. Rochester)• Head-Mounted Eye Tracking

Eyes track

Semantic resolution

~200 ms tracking time

Pick up the yellow plate

Clearly shows that understanding is online

Page 12: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

12

My Favorite Experiments (II)

• Garden Paths and Context Sensitive– Crain & Steedman (U.Connecticut & U. Edinburgh)– if noun is not unique in context, postmodificiation is much

more likely than if noun picks out unique individual

• Garden Paths are Frequency and Agreement Sensitive– Tanenhaus et al.– The horse raced past the barn fell. (raced likely past)– The horses brought into the barn fell. (brought likely

participle, and less likely activity for horses)

Page 13: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

13

Stats: Explanation or Stopgap

• A Common View– Statistics are some kind of approximation of underlying

factors requiring further explanation.

• Steve Abney’s Analogy (AT&T Labs)– Statistical Queueing Theory– Consider traffic flows through a toll gate on a highway. – Underlying factors are diverse, and explain the actions of

each driver, their cars, possible causes of flat tires, drunk drivers, etc.

– Statistics is more insightful [explanatory] in this case as it captures emergent generalizations

– It is a reductionist error to insist on low-level account

Page 14: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

14

Competence vs. Performance

• What is computed vs. how it is computed• The what can be traditional grammatical structure• All structures not computed, regardless of the how• Define what probabilistically, independently of how

Page 15: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

15

Algebraic vs. Statistical

• False Dichotomy– All statistical systems have an algebraic basis, even if

trivial

• The Good News:– Best statistical systems have best linguistic conditioning

(most “explanatory” in traditional sense)– Statistical estimatiors far less significant than the

appropriate linguistic conditioning– Rest of the talk provides examples of this

Page 16: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

16

Bayesian Statistical Modeling

• Concerned with prior and posterior probabilities• Allows updates of reasoning• Bayes’ Law: P(A,B) = P(A|B) P(B) = P(B|A) P(A)• Eg: Source/Channel Model for Speech Recognition

– Ws: sequence of words– As: sequence of acoustic observations– Compute ArgMax_Ws P(Ws|As)

ArgMax_Ws P(Ws|As)

= ArgMax_Ws P(As|Ws) P(Ws) / P(As)

= ArgMax_Ws P(As|Ws) P(Ws)

P(As|Ws) : acoustic model P(Ws) : language model

Page 17: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

17

Simple Bayesian Update Example

• Monty Hall’s Let’s Make a Deal• Three curtains with prize behind one, no other info• Contestant chooses one of three• Monty then opens curtain of one of others that does

not have the prize– if you choose curtain 2, then one of curtain 1 or 3 must

not contain prize

• Monty then lets you either keep your first guess, or change to the remaining curtain he didn’t open.

• Should you switch, stay, or doesn’t it matter?

Page 18: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

18

Answer

• Yes! You should switch.• Why? Consider possiblities:

1 2 3

1 win lose lose

2 lose win lose

3 lose lose win

StayP(win) = 1/3

1 2 3

1 lose win win

2 win lose win

3 win win lose

SwitchP(win) = 2/3

prize behind

you

sele

ct

Page 19: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

19

Defaults via Bayesian Inference

• Bayesian Inference provides an explanation for “rationality” of default reasoning

• Reason by choosing an action to maximize expected payoff given some knowledge– ArgMax_Action Payoff(Action) * P(Action|Knowledge)

• Given additional information update to Knowledge’– ArgMax_Action Payoff(Action) * P(Action|Knowledge’)– Chosen action may be different, as in Let’s Make a Deal

• Inferences are not logically sound, but are “rational”• Bayesian framework integrates partiality and

uncertainty of background knowledge

Page 20: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

20

Example: Allophonic Variation

• English Pronunciation (M. Riley & A. Llolje, AT&T)• Derived from TIMIT with phoneme/phone labels

– orthographic: bottle– phonological: / b aa t ax l / (ARPAbet phonemes)– phonetic: 0.75 [ b aa dx el ] (TIMITbet phones)– 0.13 [ b aa t el ]– 0.10 [ b aa dx ax l ]– 0.02 [ b aa t ax l ]

• Allophonic variation is non-deterministic

Page 21: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

21

Eg: Allophonic Variation (cont’d)

• Simple statistical model (simplified w/o insertion) • Estimate probability of phones given phonemes:

P(a1,…,aM|p1,…,pM)

= P(a1|p1,…,pM) * P(a2|p1,…,pM,a1) * … *

* P(aM|p1,…,pM,a1,…,aM-1)

• Approximate phoneme context to +/- k phones• Approximate phone history to 0 or 1 phones

– 0: … P(aJ|pJ-K,…,pJ,…,pJ+K) ...– 1: … P(aJ|pJ-K,…,pJ,…,pJ+K, aJ-1) …

• Uses word boundary marker and stress

Page 22: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

22

Eg: Allophonic Variation (concl’d)

• Cluster phonological features using decision trees• Sparse data smoothed by decision trees over

standard features (+/- stop, voicing, aspiration, etc.)

• Conditional entropy: w/o context 1.5 bits, w 0.8• Most likely allophone correct 85.5%, in top 5, 99%• Average 17 pronunciations/word to get 95%• Robust: handles multiple pronunciations• Scalable: to whole of English pronunciation• Portable: easy to move to new dialects with training

– K. Knight (ISI): similar techniques for Japenese pronunciation of English words!

Page 23: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

23

Example: Co-articulation

• HMMs have been applied to speech since mid-70s• Two major recent improvements, the first being

simply more training data and cycles• Second is: Context-dependent triphones• Instead of one HMM per phoneme/phone, use one

per context-dependent triphone– example: t-r+u ‘an r preceded by t and followed by u’– crucially clustered by phonological features to overcome

sparsity

Page 24: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

24

Exploratory Data Analysis

(Trendier: data mining; Trendiest: information harvesting)

• Specious Argument: A statistical model won’t help explain linguistic processes.

• Counter 1: Abney’s anti-reductionist• But even if you don’t believe that:• Counter 2: In “other sciences” (pace linguistic

tradition), statistics is used to discover regularities • Allophone example: “had your” pronunciation

– / d / is 51%likely to realize as [ jh ], 37% as [ d ]– if / d / realizes as [ jh ], / y / deletes 84%– if / d / realizes as [ d ], / y / deletes 10%

Page 25: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

25

Balancing Gricean Maxims

• Grice gives us conflicting maxims:– quantity (exactly as informative as required)– quality (try to make your contribution true)– manner (be perspicuous; eg. avoid ambiguity, be brief)

• Manner pulls in opposite directions– quality without ambiguity lengthens statements– quantity and and (part of) manner require brevity

• Balance by estimating a multidimensional “goodness” metric for generation

Page 26: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

26

Gricean Balance (cont’d)

• Consider problem for aggregation in generation– Every student ran slowly or walked quickly.

Aggregates to:– Every student ran slowly or every student walked quickly.

• This reduces sentence length, shortens clause length, and increases ambiguity.

• These tradeoffs need to be balanced

Page 27: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

27

Collins’ Head/Dependency Parser

• Michael Collins 1998 UPenn PhD thesis• Parses WSJ with ~90% constituent precision/recall• Generative model of tree probabilities• Clever Linguistic Decomposition and Training

– P(RootCat, HeadTag, HeadWord)– P(DaughterCat|MotherCat, HeadTag, HeadWord)– P(SubCat|MotherCat, DtrCat, HeadTag, HeadWord)– P(ModifierCat, ModiferTag, ModifierWord

| SubCat, MotherCat, DaughterCat, HeadTag,

HeadWord, Distance)

Page 28: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

28

Eg: Collins’ Parser (cont’d)

• Distance encodes heaviness• Adjunct vs. Complement modifiers distinguished• Head Words and Tags model lexical variation and

word-word attachment preferences• Also conditions punctuation, coordination, UDCs• 12,000 word vocabulary plus unknown word

attachment model (by Collins) and tag model (by A. Ratnaparkhi, another 1998 UPenn thesis)

• Smoothed by backing off words to categories• Trivial statistical estimators; power is conditioning

Page 29: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

29

Computational Complexity

• Wide coverage linguistic grammar generate millions of readings

• But Collins’ parser runs faster than real time on a notebook on unseen sentences of length up to 100

• How? Pruning. • Collins’ found tighter statistical estimates of tree

likelihoods with more features and more complex grammars ran faster because a tighter beam could be used – (E. Charniak & S. Caraballo at Brown have really pushed

the envelope here)

Page 30: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

30

Complexity (cont’d)

• Collins’ parser is not complete in the usual sense• But neither are humans (eg. garden paths)• Can trade speed for accuracy in statistical parsers• Syntax is not processed autonomously

– Humans can’t parse without context, semantics, etc.– Even phone or phoneme detection is very challenging,

especially in a noisy environment– Top-down expectations and knowledge of likely bottom-

up combinations prune the vast search space on line– Question is how to combine it with other factors

Page 31: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

31

N-best and Word Graphs

• Speech recognizers can return n-best histories– flights from Boston today– flights from Austin today– flights for Boston to pay– lights for Boston to pay

• Can also return a packed word graph of histories; sum of path log probs equal acoustics / word-string joint log prob

flights

lights

from

for

for

Boston

Boston

Austin

today

topay

Page 32: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

32

Probabilistic Graph Processing

• The architecture we’re exploring in the context of spoken dialogue systems involves:– Speech recognizers that produce probabilistic word

graph output– A tagger that transforms a word graph into a word/tag

graph with scores given by joint probabilities– A parser that transforms a word/tag graph into a graph-

based chart (as in CKY or chart parsing)

• Allows each module to rescore output of previous module’s decision

• Apply this architecture to speech act detection, dialogue act selection, and in generation

Page 33: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

33

Prices rose sharply after hours15-best as a word/tag graph + minimization

prices:NNS

prices: NN

rose:VBD

rose:VBP

rose:NN

sharply:RB

sharply:RB

sharply:RB

sharply:RB

sharply:RB

after:IN

after:RB

after:IN

after:IN

after:IN

after:RB

hours:NNS

hours:NNS

rose:VBD

rose:NNP

Page 34: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

34

Challenge: Beat n-grams

• Backed off trigram models estimated from 300M words of WSJ provide best language models

• We know there is more to language than two words of history

• Challenge is to find out how to model it.

Page 35: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

35

Conclusions

• Need ranking of hypotheses for applications• Beam can reduce processing time to linear

– need good statistics to do this

• More linguistic features are better for stat models– can induce the relevant ones and weights from data– linguistic rules emerge from these generalizations

• Using acoustic / word / tag / syntax graphs allows the propogation of uncertainty– ideal is totally online (model is compatible with this)– approximation allows simpler modules to do first pruning

Page 36: The quest for robustness, scalability, and portability in (spoken) language applications Linguistics Methodology meets Language Reality: Bob Carpenter.

36

Plugs

Run, don’t walk, to read:• Steve Abney. 1996. Statistical methods and linguistics. In J.

L. Klavans and P. Resnik, eds., The Balancing Act. MIT Press.

• Mark Seidenberg and Maryellen MacDonald. 1999. A probabilistic constraints approach to language acquisition and processing. Cognitive Science.

• Dan Jurafsky and James H. Martin. 2000. Speech and Language Processing. Prentice-Hall.

• Chris Manning and Hinrich Schuetze. 1999. Statistical Natural Language Processing. MIT Press.