Second Edition and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Second Edition ition Daniel Jurafsky Stanford

------

Speech and Language Processing An Introduction to Natural Language Processing,

Computational Linguistics, and Speech Recognition

Second Edition

ition

Daniel Jurafsky Stanford University

;econd Edition James H. Martin University of Colorado at Boulder

PEARSON

PrpnticeHall

Upper Saddle River, New Jersey 07458

Summary of Contents Foreword .......................•...................................xxiii Preface .......•..................................................... xxv About the Authors .xxxi

1 Introduction 1 I Words

11 Speech

111 Syntax

IV Semantics and Pragmatics

V Applications

2 Regular Expressions and Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17 3 Words and Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45 4 N-Grams 83 5 Part-of-Speech Tagging 123 6 Hidden Markov and Maximum Entropy Models 173

7 Phonetics 215 8 Speech Synthesis 249 9 Automatie Speech Recognition 285 10 Speech Recognition: Advanced Topics , 335 11 Computational Phonology 361

12 Formal Grammars of English 385 13 Syntactic Parsing 427 14 Statistical Parsing 459 15 Features and Unification 489 16 Language and Complexity 529

17 The Representation of Meaning 545 18 Computational Semantics .....•.................................. 583 19 Lexical Semantics 611 20 Computational Lexical Semantics 637 21 Computational Discourse 681

22 Information Extraction 725 23 Question Answering and Summarization 765 24 Dialogue and Conversational Agents 811 25 Machine Translation 859

Bibliography 909 Author Index 959 Subject Index 971

vii

Contents Foreword xxiii

Preface xxv

About the Authors xxxi

1 Introduction 1 1.1 Knowledge in Speech and Language Processing 2 1.2 Ambiguity................ 4 1.3 Models and Algorithms . . . . . . . . . S 1.4 Language, Thought, and Understanding . 6 I.S The State of the Art . . . . . . . . . . . 8 1.6 Some Brief History . . . . . . . . . . . 9

1.6.1 Foundational Insights: 1940s and 1950s 9 1.6.2 The Two Camps: 1957-1970. . . . . . 10 1.6.3 Four Paradigms: 1970-1983 . . . . . . 11 1.6.4 Empiricism and Finite-State Models Redux: 1983-1993 12 1.6.5 The Field Comes Together: 1994-1999 . . 12 1.6.6 The Rise of Machine Learning: 2000-2008 12 1.6.7 On Multiple Discoveries . . . . . 13 1.6.8 A Final Brief Note on Psychology 14

1.7 Summary........... 14 Bibliographical and Historical Notes . . . . . . . 15

I Words

2 Regular Expressions and Automata 17 2.1 Regular Expressions . 17

2.1.1 Basic Regular Expression Patterns . 18 2.1.2 Disjunction, Grouping, and Precedence 21 2.1.3 A Simple Example ..... 22 2.1.4 A More Complex Example . . . . . . . 23 2.1.5 Advanced Operators . 24 2.1.6 Regular Expression Substitution, Memory, and ELIZA 25

2.2 Finite-State Automata . 26 2.2.1 Use of an FSA to Recognize Sheeptalk 27 2.2.2 Formal Languages . 30 2.2.3 Another Example . 31 2.2.4 Non-Deterministic FSAs . 32 2.2.5 Use of an NFSA to Accept Strings . 33 2.2.6 Recognition as Search . 35 2.2.7 Relation of Detelministic and Non-Deterministic Automata 38

2.3 Regular Languages and FSAs 38 2.4 Summary . 41

ix

x Contents

Bibliographieal and Historieal Notes 42 Exereises 42

3 Words and Transducers 45 3.1 Survey of (Mostly) English Morphology 47

3.1.1 Infleetional Morphology 48 3.1.2 Derivational Morphology .... 50 3.1.3 Clitieization . 51 3.1.4 Non-Coneatenative Morphology 51 5 3.1.5 Agreement........ 52

3.2 Finite-State Morphologieal Parsing 52 3.3 Construetion of a Finite-State Lexieon 54 3.4 Finite-State Transdueers . 57

3.4.1 Sequential Transdueers and Determinism 59 3.5 FSTs for Morphologieal Parsing 60 3.6 Transdueers and Orthographie RuJes 62 3.7 The Combination of an FST Lexieon and Rules 65 3.8 Lexieon-Free FSTs: The Porter Stemmer 68 3.9 Word and Sentenee Tokenization 68

3.9.1 Segmentation in Chinese . 70 3.10 Deteetion and Correetion of Spelling Errors 72 3.11 Minimum Edit Distanee . 73 3.12 Human Morphologieal Proeessing 77 3.13 Summary . 79 Bibliographieal and Historieal Notes 80 Exereises 81

4 N-Grams 83 4.1 Word Counting in Corpora 85 4.2 Simple (Unsmoothed) N-Grams . 86 4.3 Training and Test Sets . 91

4.3.1 N-Gram Sensitivity to the Training Corpus 92 4.3.2 Unknown Words: Open Versus C10sed Voeabulary Tasks 95

4.4 Evaluating N-Grams: Perplexity 95 4.5 Smoothing 97 6

4.5.1 Laplaee Smoothing ... 98 4.5.2 Good-Turing Diseounting 101 4.5.3 Some Advaneed Issues in Good-Turing Estimation 102

4.6 Interpolation................... 104 4.7 Baekoff . 105

4.7.1 Advaneed: Details of Computing Katz Baekoff a and P* 107 4.8 Praetieal Issues: Toolkits and Data Formats . 108 4.9 Advaneed Issues in Language Modeling . 109

4.9.1 Advaneed Smoothing Methods: Kneser-Ney Smoothing 109 4.9.2 Class-Based N-Grams . 111 4.9.3 Language Model Adaptation and Web Use ..... 112

Contents Xl

42 42

45 47 48 50 51 51 52 52 54 57 59 60 62 65 68 68 70 72 73

77 79 80 81

83 85 86 91

lS 92 ocabulary Tasks . 95

95 97 98

101 ~stimation 102

104 105

ackoff a and P' . 107 108 109

Ney Smoothing 109 111

se . 112

4.9.4 Using Longer-Distance Information: A Brief Summary. 4.10 Advanced: Information Theory Background . . . . . . . . . . .

4.10.1 Cross-Entropy for Comparing Models . 4.11 Advanced: The Entropy of English and Entropy Rate Constancy 4.12 Summary . Bibliographical and Historical Notes Exercises .

5 Part-of-Speech Tagging 5.1 (Mostly) English Word Classes 5.2 Tagsets for English . 5.3 Part-of-Speech Tagging . 5.4 Rule-Based Part-of-Speech Tagging . 5.5 HMM Part-of-Speech Tagging .. ,

5.5.1 Computing the Most Likely Tag Sequence: An Examp1e 5.5.2 Formalizing Hidden Markov Model Taggers 5.5.3 Using the Viterbi Algorithm for HMM Tagging 5.5.4 Extending the HMM Algorithm to Trigrams .

5.6 Transformation-Based Tagging .... 5.6.1 How TBL Rules Are Applied 5.6.2 How TBL Rules Are Learned

5.7 Evaluation and Error Analysis . 5.7.1 Error Analysis .

5.8 Advanced Issues in Part-of-Speech Tagging 5.8.1 Practical Issues: Tag Indeterminacy and Tokenization . 5.8.2 Unknown Words . 5.8.3 Part-of-Speech Tagging for Other Languages 5.8.4 Tagger Combination .

5.9 Advanced: The Noisy Channel Model for Spelling . 5.9.1 Contextual Spelling Error Correction

5.10 Summary . Bibliographical and Historical Notes Exercises .

6 Hidden Markov and Maximum Entropy Models 6.1 Markov Chains . 6.2 The Hidden Markov Model . 6.3 Likelihood Computation: The Forward Algorithm 6.4 Decoding: The Viterbi Aigorithm . 6.5 HMM Training: The Forward-Backward Algorithm 6.6 Maximum Entropy Models: Background

6.6.1 Linear Regression . 6.6.2 Logistic Regression . 6.6.3 Logistic Regression: Classification 6.6.4 Advanced: Learning in Logistic Regression

6.7 Maximum Entropy Modeling .

112 114 116 118 119 120 121

123 124 130 133 135 139 142 144 145 149 151 152 152 153 156 157 157 158 160 163 163 167 168 169 171

173 174 176 179 184 186 193 194 197 199 200 201

xii Contents

6.7.1 Why We Call It Maximum Entropy 205 6.8 Maximum Entropy Markov Models ..... 207

6.8.1 Decoding and Learning in MEMMs 210 6.9 Summary ........... 211 Bibliographieal and Historieal Notes 212 Exereises .............. 213

11 Speech

7 Phonetics 215 7.1 Speech Sounds and Phonetie Transeription 216 7.2 Artieulatory Phoneties ........... 217

7.2.1 The Voeal Organs. . . . . . . . . 218 !I 97.2.2 Consonants: Plaee of Artieulation 220

7.2.3 Consonants: Manner of Artieulation . 221 7.2.4 Vowels ................ 222 7.2.5 Syllables . . . . . . . . . . . . . . . 223

7.3 Phonologieal Categories and Pronuneiation Variation 225 7.3.1 Phonetie Features. . . . . . . . . . . . 227 7.3.2 Predieting Phonetie Variation ..... 228 7.3.3 Faetors Influeneing Phonetie Variation. 229

7.4 Aeoustie Phoneties and Signals 230 7.4.1 Waves .................. 230 7.4.2 Speech Sound Waves .......... 231 7.4.3 Frequeney and Amplitude; Piteh and Loudness 233 7.4.4 Interpretation of Phones from a Waveform 236 7.4.5 Speetra and the Frequeney Domain 236 7.4.6 The Souree-Filter Model ........ 240

7.5 Phonetie Resourees ............... 241 7.6 Advaneed: Artieulatory and Gestural Phonology 244 7.7 Summary ........... 245 Bibliographieal and Historieal Notes 246 Exereises .............. 247

8 Speech Synthesis 249 8.1 Text Normalization ...... 25]

8.1.1 Sentenee Tokenization 251 I H 8.1.2 Non-Standard Words . 252 8.1.3 Homograph Disambiguation 256

8.2 Phonetie Analysis ..... 257 8.2.1 Dietionary Lookup ..... 257 8.2.2 Names ........... 258 8.2.3 Grapheme-to-Phoneme Conversion 259

8.3 Prosodie Analysis ....... 262 8.3.1 Prosodie Strueture .. 262 8.3.2 Prosodie Prominenee . 263

Contents Xill

205 207 210 211 212 213

215 216 217 218 220 221 222 223 225 227 228 229 230 230 231

Iness 233 236 236 240 241 244 245 246 247

249 251 251 252 256 257 257 258 259 262 262 263

8.3.3 Tune . 8.3.4 More Sophisticated Models: ToBI . 8.3.5 Computing Duration from Prosodie Labels 8.3.6 Computing FO from Prosodie Labels .... 8.3.7 Final Result of Text Analysis: Internal Representation

8.4 Diphone Waveform Synthesis . 8.4.1 Steps for Bui1ding a Diphone Database . 8.4.2 Diphone Concatenation and TD-PSOLA for Prosody

8.5 Vnit Selection (Waveform) Synthesis 8.6 Evaluation . Bibliographical and Historical Notes Exercises .

9 Automatie Speech Recognition 9.1 Speech Recognition Architecture . 9.2 The Hidden Markov Model Applied to Speech 9.3 Feature Extraction: MFCC Vectors

9.3.1 Preemphasis . 9.3.2 Windowing . 9.3.3 Discrete Fourier Transform . 9.3.4 Mel Filter Bank and Log .. 9.3.5 The Cepstrum: Inverse Discrete Fourier Transform 9.3.6 Deltas and Energy . 9.3.7 Summary: MFCC .

9.4 Acoustic Likelihood Computation . 9.4.1 Vector Quantization . 9.4.2 Gaussian PDFs . 9.4.3 Probabilities, Log-Probabilities, and Distance Functions

9.5 The Lexicon and Language Model 9.6 Search and Decoding . 9.7 Embedded Training . 9.8 Evaluation: Word Error Rate 9.9 Summary . Bibliographical and Historical Notes Exercises .

10 Speech Recognition: Advanced Topics 10.1 Multipass Decoding: N-Best Lists and Lattices . 10.2 A' ("Stack") Decoding . 10.3 Context-Dependent Acoustic Models: Triphones 10.4 Discriminative Training . . . . . . . . . . . . .

IOAI Maximum Mutual Information Estimation . 10.4.2 Acoustic Models Based on Posterior Classifiers .

10.5 Modeling Variation . lOS I Environmental Variation and Noise . 10.5.2 Speaker Variation and Speaker Adaptation

265 266 268 269 271 272 272 274 276 280 281 284

285 287 291 295 296 296 298 299 300 302 302 303 303 306 313 314 314 324 328 330 331 333

335 335 341 345 349 350 351 352 352 353

xiv Contents

10.5.3 Pronunciation Modeling: Variation Oue to Genre 10.6 Metadata: Boundaries, Punctuation, and Disfluencies 10.7 Speech Recognition by Humans . 10.8 Summary . Bibliographical and Historical Notes Exercises .

]] Computational Phonology 11.1 Finite-State Phonology . 11.2 Advanced Finite-State Phonology .

11.2.1 Harmony . 11.2.2 Templatic Morpho1ogy . .

11.3 Computational Optimality Theory . 11.3.1 Finite-State Transducer Models of Optimality Theory . 11.3.2 Stochastic Models of Optimality Theory .

11.4 Syllabification . 11.5 Learning Phonology and Morphology .

11.5.1 Leaming PhonologicaJ Rules . 11.5.2 Learning Morphology .... 11.5.3 Learning in Optimality Theory .

11.6 Summary........... Bib1iographical and Historica1 Notes Exercises .

III Syntax

12 Formal Grammars of English 12.1 Constituency...................... 12.2 Context-Free Grammars .

12.2.1 Formal Definition of Context-Free Grammar 12.3 Some Grammar Rules for English ...

12.3.1 Sentence-Level Constructions 12.3.2 Clauses and Sentences 12.3.3 The Noun Phrase . 12.3.4 Agreement . 12.3.5 The Verb Phrase and Subcategorization 12.3.6 AuxiJiaries . 12.3.7 Coordination .

12.4 Treebanks . . . . . . . . . . . . . . . . . . . 12.4.1 Example: The Penn Treebank Project 12.4.2 Treebanks as Granunars 12.4.3 Treebank Searching . 12.4.4 Heads and Head Finding .

12.5 Grammar Equiva1ence and Normal Form 12.6 Finite-State and Context-Free Grammars 12.7 Dependency Grammars .

354 356 358 359 359 360

36] 361 365 365 366 367 369 370 372 375 375 377 380 381 381 383

385 386 387

],391 392 392 394 394 398 400 402 403 404 404 406 408 409 412 413 414

I

Contents xv

Genre 354

~s 356 358 359 359 360

361 361 365 365 366 367

lity Theory . 369 370 372 375 375 377 380 381 381 383

385 386 387

tar 391 392 392 394 394 398 400 402 403 404 404 406 408 409 412 413 414

12.7.1 The Relationship Between Dependencies and Heads 415 12.7.2 Categorial Grammar . . 417

12.8 Spoken Language Syntax . . . . . . . . 417 12.8.1 Disfluencies and Repair . . . . 418 12.8.2 Treebanks for Spoken Language 419

12.9 Grammars and Human Processing . 420 12.10 Summary. . . . . . . . . . . 421 Bibliographical and Historical Notes 422 Exercises 424

13 Syntactic Parsing 427 13.1

13.2 13.3 13.4

13.5

13.6

ParsingasSearch 428 13.1.1 Top-Down Parsing 429 13.1.2 Bottom-Up Parsing. 430 13.1.3 Comparing Top-Down and Bottom-Up Parsing 431 Ambiguity . . . . . . . . . . . . . . . . 432 Search in the Face of Ambiguity .... 434 Dynamic Programming Parsing Methods 435 13.4.1 CKY Parsing . . . . . 436 13.4.2 The Earley Algorithm 443 13.4.3 Chart Parsing . . . . . 448 Partial Parsing . . . . . . . . . 450 13.5.1 Finite-State Rule-Based Chunking . 452 13.5.2 Machine Learning-Based Approaches to Chunking 452 13.5.3 Chunking-System Evaluations 455 Summary........... 456

Bibliographical and Historical Notes 457 Exercises 458

14 Statistical Parsing 459 14.1 Probabilistic Context-Free Grammars . 460

14.1.1 PCFGs for Disambiguation. . 461 14.1.2 PCFGs for Language Modeling 463

14.2 Probabilistic CKY Parsing of PCFGs . . 464 14.3 Ways to Learn PCFG Rule Probabilities 467 14.4 Problems with PCFGs . . . . . . . . . . 468

14.4.1 Independence Assumptions Miss Structural Dependencies Between Rules. . . . . . . . . . . . . . . . . . 468

14.4.2 Lack of Sensitivity to Lexical Dependencies 469 14.5 Improving PCFGs by Splitting Non-Terminals 471 14.6 Probabilistic Lexicalized CFGs . . . . . . . . . . . . 473

14.6.1 The Collins Parser . . . . . . . . . . . . . . 475 14.6.2 Advanced: Further Details of the Collins Parser. 477

14.7 Evaluating Parsers . . . . . . . . . . . . . . . 479 14.8 Advanced: Discriminative Reranking . . . . . 481 14.9 Advanced: Parser-Based Language Modeling . 482

xvi Contents

14.10 Human Parsing . 14.11 Summary . Bibliographical and Historical Notes Exercises .

15 Features and Unifkation 15.1 Feature Structures . 15.2 Unification of Feature Structures 15.3 Feature Structures in the Grammal'

15.3.1 Agreement . 15.3.2 Head Features . 15.3.3 Subcategorization 15.3.4 Long-Distance Dependencies

15.4 Implementation of Unification 15.4.1 Unification Data Structures . 15.4.2 The Unification Aigorithm .

15.5 Parsing with Unification Constraints 15.5.1 Integration of Unification into an Earley Parser 15.5.2 Unification-Based Parsing .

15.6 Types and Inheritance . 15.6.1 Advanced: Extensions to Typing 15.6.2 Other Extensions to Unification

15.7 Summary . Bibliographical and Historical Notes Exercises .

16 Language and Complexity 16.1 The Chomsky Hierarchy . 16.2 Ways to Tell if a Language Isn't Regular

16.2.1 The Pumping Lemma ..... 16.2.2 Proofs that Various Natural Languages Are Not Regular

16.3 Is Natural Language Context Free? 16.4 Complexity and Human Processing 16.5 Summary . Bibliographical and Historical Notes Exercises .

IV Semantics and Pragmatics

17 The Representation ofMeaning 17.1 Computational Desiderata for Representations

17.1.1 17.1.2 17.1.3 17.1.4 17.1.5

Verifiability . Unambiguous Representations Canonical Form Inference and Variables. Expressiveness .....

483 485 486 488

489 490 492 497 498 500 501 506 507 507 509 513 514 519 521 524 525 525 526 527

529 530 532 533 535 537 539 542 543 544

545 547 547 548 549 550 551

1

Contents xvii

483 485 486 488

489 490 492 497 498 500 501 506 507 507 509 513 514 519 521 524 525 525 526 527

529 530 532 533

re Not Regular 535 537 539 542 543 544

545 547 547 548 549 550 551

17.2 Model-Theoretic Semantics . 17.3 First-Order Logic .

17.3.1 Basic Elements of First-Order Logic . 17.3.2 Variables and Quantifiers . 17.3.3 Lambda Notation. . .. . . 17.3.4 The Semantics of First-Order Logic 17.3.5 Inference .

17.4 Event and State Representations . 17.4.1 Representing Time 17.4.2 Aspect .

17.5 Description Logics . 17.6 Embodied and Situated Approaches to Meaning 17.7 Summary . Bibliographical and Historical Notes Exercises .

18 Computational Semantics 18.1 Syntax-Driven Semantic Analysis . 18.2 Semantic Augmentations to Syntactic Rules .. 18.3 Quantifier Scope Ambiguity and Underspecification

18.3.1 Store and Retrieve Approaches . 18.3.2 Constraint-Based Approaches .

18.4 Unification-Based Approaches to Semantic Analysis. 18.5 Integration of Semantics into the Earley Parser 18.6 Idioms and Compositionality 18.7 Summary........... BibJiographicaJ and Historical Notes Exercises .

19 LexicaI Semantics 19.1 19.2

19.3 19.4

19.5 19.6 19.7

Word Senses . Relations Between Senses . 19.2.1 Synonymy and Antonymy 19.2.2 Hyponymy . 19.2.3 Semantic Fields . WordNet: A Database of LexicaJ Relations Event Participants . 19.4.1 Thematic Roles . 19.4.2 Diathesis Alternations . 19.4.3 Problems with Thematic Roles . 19.4.4 The Proposition Bank 19.4.5 FrameNet . 19.4.6 Selectional Restrictions Primitive Decomposition Advanced: Metaphor Summary .

552 555 555 557 559 560 561 563 566 569 572 578 580 580 582

583 583 585 592 592 595 598 604 605 607 607 609

611 612 615 615 616 617 617 619 620 622

623 624 625 627 629 631 632

xviii Contents

Bibliographieal and Historieal Notes 633 Exereises . 634

20 Computational Lexical Semantics 637 20.1 Word Sense Disambiguation: Overview . 638 20.2 Supervised Word Sense Disambiguation . 639

20.2.1 Feature Extraetion for Supervised Learning 640 20.2.2 Naive Bayes and Deeision List Classifiers . 641

20.3 WSD Evaluation, Baselines, and Ceilings . 644 20.4 WSD: Dietionary and Thesaurus Methods . 646 \i

20.4.1 The Lesk Algorithm . 646 2:

20.4.2 Seleetional Restrietions and Seleetional Preferenees 648 20.5 Minimally Supervised WSD: Bootstrapping 650 20.6 Word Similarity: Thesaurus Methods . 652 20.7 Word Similarity: Distributional Methods . . . . . . 658

20.7.1 Defining a Word's Co-Oeeurrence Vectors . 659 20.7.2 Measuring Assoeiation with Context .... 661 20.7.3 Defining Similarity Between Two Vectors . 663 20.7.4 Evaluating Distributional Word Similarity . 667

20.8 Hyponymy and Other Word Relations . 667 20.9 Semantie Role Labeling . 670 20.10 Advaneed: Unsupervised Sense Disambiguation 674 20.11 Summary . 675 Bibliographieal and Historical Notes 676 Exercises . 679

21 Computational Discourse 681 21.1 Diseourse Segmentation . . . . . . . . . . . . . 684

21.1.1 Unsupervised Diseourse Segmentation 684 21.1.2 Supervised Diseourse Segmentation 686 21.1.3 Diseourse Segmentation Evaluation 688

21.2 Text Coherenee . 689 21.2.1 Rhetorieal Strueture Theory . . . . 690 21.2.2 Automatie Coherenee Assignment . 692

21.3 Referenee Resolution . 695 21.4 Referenee Phenomena . . . . . . . . . . . . 698

21.4.1 Five Types of Referring Expressions . 698 21.4.2 Information Status . 700

21.5 Features for Pronominal Anaphora Resolution 701 21.5.1 Features for Filtering Potential Referents 701 21.5.2 Preferenees in Pronoun Interpretation .. 702

21.6 Three Algorithms for Anaphora Resolution .... 704 21.6.1 Pronominal Anaphora Baseline: The Hobbs Algorithm 704 21.6.2 A Centering Aigorithm for Anaphora Resolution . . . 706 21.6.3 A Log-Linear Model for Pronominal Anaphora Resolution 708 21.6.4 Features for Pronominal Anaphora Resolution . 709

23

Contents xix

633 634

637 638 639

19 640 641 644 646 646

references 648 650 652 658

, . 659 661 663 667 667 670 674 675 676 679

681 684 684 686 688 689 690 692 695 698 698 700 701 701 702 704

Jbs Aigorithm 704 mlution . . . 706 Iphora Resolution 708 ution 709

21.7 Coreference Resolution . 710 21.8 Evaluation of Coreference Resolution . 712 21.9 Advanced: Inference-Based Coherence Resolution. 713 21.10 Psycholinguistic Studies of Reference . 718 21.11 Summary . 719 BibliographicaJ and Historica1 Notes 720 Exercises . 722

V Applications

22 Information Extraction 725 22.1 Named Entity Recognition . 727

22.1.1 Ambiguity in Named Entity Recognition 729 22.1.2 NER as Sequence Labeling . 729 22.1.3 Evaluation of Named Entity Recognition 732 22.1.4 Practica1 NER Architectures . . . . . . . 734

22.2 Relation Detection and Classification . . . . . . . 734 22.2.1 Supervised Learning Approaches to Relation Analysis 735 22.2.2 Lightly Supervised Approaches to Relation Analysis 738 22.2.3 Evaluation of Relation Analysis Systems 742

22.3 Temporal and Event Processing . . . . . . 743 22.3.1 Temporal Expression Recognition 743 22.3.2 Temporal Normalization ... 746 22.3.3 Event Detection and Analysis 749 22.3.4 TimeBank . 750

22.4 Template Filling . . . . . . . . . . . . 752 22.4.1 Statistical Approaches to Template-Filling 752 22.4.2 Finite-State Template-Filling Systems 754

22.5 Advanced: Biomedica1 Information Extraction 757 22.5.1 Biological Named Entity Recognition 758 22.5.2 Gene Normalization . 759 22.5.3 Biologica1 Roles and Relations. 760

22.6 Summary........... 762 Bibliographical and Historical Notes 762 Exercises . 763

23 Question Answering and Summarization 765 23.1 Information Retrieval . . . . . . 767

23.1.1 The Vector Space Model . . 768 23.1.2 Term Weighting . 770 23.1.3 Term Selection and Creation 772 23.1.4 Evaluation of Information-Retrieval Systems 772 23.1.5 Homonymy, Polysemy, and Synonymy 776 23.1.6 Ways to Improve User Queries . 776

23.2 Factoid Question Answering 778 23.2.1 Question Processing . . . . . . 779

xx Contents

23.2.2 Passage Retrieval . 781 23.2.3 Answer Processing . 783 23.2.4 Evaluation of Factoid Answers . 787

23.3 Summarization . 787 23.4 Single-Document Summarization . 790

23.4.1 Unsupervised Content Selection 790 23.4.2 Unsupervised Summarization Based on Rhetorical Parsing 792 23.4.3 Supervised Content Selection 794 23.4.4 Sentence Simplification . 795

23.5 Multi-Document Summarization . 796 23.5.1 Content Selection in Multi-Document Summarization 797 23.5.2 Information Ordering in Multi-Document Summarization 798

23.6 Focused Summarization and Question Answering 801 23.7 Summarization Evaluation .. 805 23.8 Summary . 807 Bibliographical and Historical Notes 808 Exercises . 8lO

24 Dialogue and Conversational Agents 811 24.1 Properties of Human Conversations . 813

24.1.1 Turns and Turn-Taking ... 813 24.1.2 Language as Action: Speech Acts 815 24.1.3 Language as Joint Action: Grounding 816 24.1.4 Conversational Structure .. 818 24.1.5 Conversational Implicature . 819

24.2 Basic Dialogue Systems . 821 24.2.1 ASR Component . 821 24.2.2 NLU Component . 822 24.2.3 Generation and TTS Components 825 24.2.4 Dialogue Manager . 827 24.2.5 Dealing with Errors: Confirmation and Rejection 831

24.3 VoiceXML . 832 24.4 Dialogue System Design and Evaluation 836 Bit

24.4.1 Designing Dialogue Systems . 836 AUI24.4.2 Evaluating Dialogue Systems 836

24.5 Information-State and Dialogue Acts 838 Sul 24.5.1 Using Dialogue Acts ... 840 24.5.2 Interpreting Dialogue Acts . 841 24.5.3 Detecting Correction Acts . 844 24.5.4 Generating Dialogue Acts: Confirrnation and Rejection . 845

24.6 Markov Decision Process Architecture . 846 24.7 Advanced: P1an-Based Dialogue Agents . 850

24.7.1 Plan-Inferential Interpretation and Production . 851 24.7.2 The Intentional Structure of Dialogue 853

24.8 Summary . 855 Bibliographical and Historical Notes . 856

2

Contents xxi

781 783 787 787 790 790

hetorical Parsing 792 794 795 796

nmarizalion 797 Summarization 798

801 805 807 808 810

811 813 813 815 816 818 819 821 821 822 825 827

~jection 831 832 836 836 836 838 840 841 844

and Rejection . 845 846 850

;tion . 851 853 855 856

Exercises . 858

25 Machine Translation 859 25.1 Why Machine Translation Is Hard . 862

25.1.1 Typology . 862 25.1.2 Other Structural Divergenees . 864 25.1.3 Lexical Divergenees ..... 865

25.2 Classieal MT and the Vauguois Triangle 867 25.2.1 Direct Translation . 868 25.2.2 Transfer . 870 25.2.3 Combined Direct and Transfer Approaches in Classic MT 872 25.2.4 The Interlingua Idea: Using Meaning 873

25.3 StatisticaJ MT . 874 25.4 P(FIE): The Phrase-Based Translation Model 877 25.5 Alignment in MT . 879

25.5.1 IBM Modell . 880 25.5.2 HMM AJignment . 883

25.6 Training Alignment Models 885 25.6.1 EM für Training Alignment Models 886

25.7 Symmetrizing Alignments for Phrase-Based MT 888 25.8 Decoding for Phrase-Based Statistical MT 890 25.9 MT Evaluation . 894

25.9.1 Using Human Raters . 894 25.9.2 Automatie Evaluation: BLEU 895

25.10 Advaneed: Syntaetie Models for MT . 898 25.11 Advaneed: IBM Model 3 and Fertility 899

25.1 1.1 Training for Model 3 ..... 903 25.12 Advaneed: Log-Linear Models for MT 903 25.13 Summary . 904 Bibliographical and Historieal Notes 905 Exercises . 907

Bibliography 909

Author Index 959

Subject Index 971

Second Edition and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Second Edition ition Daniel Jurafsky Stanford

Documents