Top Banner
438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22
38
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

438/538Computational Linguistics

Sandiway Fong

Lecture 1: 8/22

Page 2: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Part 1

• Administrivia

Page 3: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Where– BIO W 212

• When– TR 12:30–1:45PM

• No Class– Thursday September 14th– Thursday September 28th– Thursday November 23rd (Thanksgiving)

• Office Hours– catch me after class, or– by appointment– Location: Douglass 311

Page 4: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Map

– Office (Douglass)

– Classroom (FCS)

Page 5: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Email– [email protected]

• Homepage– http://dingo.sbs.arizona.edu/~sandiway

• Lecture slides– available on homepage after each class– in both PowerPoint (.ppt) and Adobe PDF formats

• animation: in powerpoint

– last year’s slides are available • (new material for this year will be rotated in)

Page 6: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

Reference Textbook• Speech and Language Processing,

Jurafsky & Martin, Prentice-Hall 2000– 21 chapters (900 pages)– Concepts, algorithms, heuristics– Sound/speech side

• N. Warner Speech Tech LING 578 (this semester)

• Y. Lin Statistical NLP LING 539 (Spring 2007)

– Intersection with research areas• Parsing and Linguistic Theory

(Sentence Processing) • Computational Morphology• Machine Translation, WordNet

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 7: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Course Objectives– Theoretical

• Introduction to a broad selection of natural language processing techniques

• Survey course• Relevance to linguistic theory

– Practical• Acquire some expertise

– Parsing algorithms– Write grammars and machines– Build a toy machine translation system

Page 8: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Laboratory Exercises– To run tools and write grammars– you need access to computational facilities

• use your PC (Windows, Linux) or Mac

– Homework exercises

Page 9: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Homeworks and Grading– 6~8 homeworks

• no final or mid-terms• mix of theoretical and practical exercises• there will be mandatory and extra credit questions

– extra credit questions matter: » make up points lost on other questions in the

homework» may bump you up a grade at the end of the semester

in borderline situations

• some simple programming is involved (no prerequisite)• use of a spreadsheet (Excel) for numerical calculation

Page 10: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Homeworks and Grading– Homeworks will be presented/explained in class

• (good chance to ask questions)

– Please attempt homeworks early• (then you can ask questions before the deadline)

– Unless otherwise specified, you have one week to do the homework

• (midnight deadline)• (email submission to me)• e.g. homework comes out on Thursday, it is due in my mailbox

by next Thursday midnight

– Look for acknowledgement email from me

Page 11: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Homework Ethics– you may discuss homework with your classmates– however, you must do the work and write them up independently– sources must be acknowledged,

• e.g. if you borrow program code off the internet• discovered cheaters will be sanctioned

• Late Policy– all homework is mandatory

• you can’t get an A skipping a homework• some homeworks may depend on earlier homeworks

– deductions if late– if you know you are going to be late, notify me ahead of time

• e.g. upcoming emergencies

Page 12: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• 438 vs. 538538

=

438

+

1 classroom presentation of a selected chapter

+438 extra credit homework questions are obligatory

Page 13: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• There is a laptop being passed around

• Fill out spreadsheet entry– Name– Email– Year/Major– 438 or 538– Relevant background

Page 14: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Administrivia

• Class demographics (8/20 classlist)438/538

LING

COSC

EPH

MATHNDSNMS

MGT

PSYC

SLA

ESL

APPL

Page 15: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Part 2

• Introduction

Page 16: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• ... is everywhere

• information is organized and accessed using language

Page 17: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

Beginnings• c. 1950 (just after WWII)

– Electronic computers invented for• numerical analysis• code breaking

Killer AppsKiller Apps: : – Language comprehension tasks and Machine Translation (MT)Language comprehension tasks and Machine Translation (MT)

Reference– Readings in Machine Translation– Eds. Nirenburg, S. et al. MIT Press 2003. – (Part 1: Historical Perspective)

Page 18: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• Cryptoanalysis Basis– early optimism

[Translation. Weaver, W.]• Citing Shannon’s work, he asks: • “If we have useful methods for solving almost any cryptographic

problem, may it not be that with proper interpretation we already have useful methods for translation?”

Page 19: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• Popular in the early days and has undergone a modern revival

The Present Status of Automatic Translation of Languages (Bar-Hillel, 1951)

– “I believe this overestimation is a remnant of the time, seven or eight years ago, when many people thought that the statistical theory of communication would solve many, if not all, of the problems of communication”

– Much valuable time spent on gathering statistics• perhaps no longer a bottleneck

Page 20: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• uneasy relationship between linguistics and statistical analysis

Statistical Methods and Linguistics (Abney, 1996)

– Chomsky vs. Shannon

• Statistics and low (zero) frequency items– Smoothing

• No relation between order of approximation and grammaticality

• Parameter estimation problem is intractable (for humans)– IBM (17 million parameters)

Page 21: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• recent exciting developments in HLT– precipitated by progress in

• computers: stochastic machine learning methods• storage: large amounts of training data

– recent improvements in stochastic models from incorporating linguistic knowledge

– (Hovy, MT Summit 2003)

Page 22: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Human Language Technology (HLT)

• Killer Application?

Page 23: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Natural Language Processing (NLP)Computational Linguistics

• Question:– How to process natural languages on a computer

• Intersects with:– Computer science (CS)– Mathematics/Statistics – Artificial intelligence (AI)– Linguistic Theory– Psychology: Psycholinguistics

• e.g. the human sentence processor

Page 24: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Natural Language Properties

which properties are going to be difficult for computers to deal with?

• Grammar (Rules for putting words together into sentences)– How many rules are there?

• 100, 1000, 10000, more …

– Portions learnt or innate– Do we have all the rules written down somewhere?

• Lexicon (Dictionary)– How many words do we need to know?

• 1000, 10000, 100000 …

Page 25: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Computers vs. Humans

• Knowledge of language– Computers are way

faster than humans• They kill us at arithmetic

and chess

– But human beings are so good at language, we often take our ability for granted

• Processed without conscious thought

• Exhibit complex behavior

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

IBM’s Deep Blue

Page 26: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Examples

• Innate Knowledge?– Which report did you file without reading?– (Parasitic gap sentence)– file(x,y)– read(u,v)

x = youy = reportu = x = youv = y = reportand there are no other possible interpretations

*the report was filed without reading

Page 27: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Examples

• Changes in interpretation• John is too stubborn to talk to• John is too stubborn to talk to Bill

talk_to(x,y)

(1) x = arbitrary person, y = John

(2) x = John, y = Bill

Page 28: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Examples

• Ambiguity– Where can I see the bus stop?

– stop: verb or part of the noun-noun compound bus stop– Context (Discourse or situation)

– Where can I see [the [NN bus stop]]?– Where can I see [[the bus] [V stop]]?

Page 29: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Examples

• Ungrammaticality– *Which book did you file the report without

reading?

– * = ungrammatical• relative

– ungrammatical vs. incomprehensible

Page 30: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Example

• The human parser has quirks• Ian told the man that he hired a story• Ian told the man that he hired a secretary

• Garden-pathing• Temporary ambiguity• tell: multiple syntactic frames for the verb

• Ian told [the man that he hired] [a story]• Ian told [the man] [that he hired a secretary]

Page 31: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Examples

• More subtle differences• The reporter who the senator attacked admitted the error• The reporter who attacked the senator admitted the error

– Processing time differences• Subject vs. object relative clauses

– Q: Do we want to mimic the human parser completely?

Page 32: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Frequently Asked Questions from the Linguistic Society of America (LSA)

• http://www.lsadc.org/info/ling-faqs.cfm

Page 33: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

LSA (Linguistic Society of America) pamphlet

• by Ray Jackendoff

• A Linguist’s Perspective on What’s Hard for Computers to Do …

– is he right?

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 34: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

If computers are so smart, why can't they use simple English?

• Consider, for instance, the four letters read; they can be pronounced as either reed or red. How does the machine know in each case which is the correct pronunciation? Suppose it comes across the following sentences:

• (l) The girls will read the paper. (reed) • (2) The girls have read the paper. (red) • We might program the machine to pronounce read as reed if it

comes right after will, and red if it comes right after have. But then sentences (3) through (5) would cause trouble.

• (3) Will the girls read the paper? (reed) • (4) Have any men of good will read the paper? (red) • (5) Have the executors of the will read the paper? (red) • How can we program the machine to make this come out

right?

Page 35: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

If computers are so smart, why can't they use simple English?

• (6) Have the girls who will be on vacation next week read the paper yet? (red)

• (7) Please have the girls read the paper. (reed)• (8) Have the girls read the paper?(red)• Sentence (6) contains both have and will before read, and both

of them are auxiliary verbs. But will modifies be, and have modifies read. In order to match up the verbs with their auxiliaries, the machine needs to know that the girls who will be on vacation next week is a separate phrase inside the sentence.

• In sentence (7), have is not an auxiliary verb at all, but a main verb that means something like 'cause' or 'bring about'. To get the pronunciation right, the machine would have to be able to recognize the difference between a command like (7) and the very similar question in (8), which requires the pronunciation red.

Page 36: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Next time …

• We will begin by introducing you to a programming language you will become familiar with– Two introductory lectures– Name: PROLOG (Programming in Logic)– Variant: SWI-PROLOG (free software from University of

Amsterdam)• Download: http://www.swi-prolog.org/• Install it on your PC or Mac

– Based on mathematical logic• logic and inference are useful tools

– Contains built-in grammar rules• programming language was originally designed for NLP

Page 37: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Prolog Resources

• Some background in programming?

• Useful Online Tutorials– An introduction to Prolog

• (Michel Loiseleur & Nicolas Vigier)• http://invaders.mars-attacks.org/~boklm

/prolog/

– Learn Prolog Now! • (Patrick Blackburn, Johan Bos &

Kristina Striegnitz)• http://www.coli.uni-saarland.de/~kris/le

arn-prolog-now/lpnpage.php?pageid=online

Page 38: 438/538 Computational Linguistics Sandiway Fong Lecture 1: 8/22.

Prolog Resources

• No background at all?

• Audit– LING 388 Computers and Language

• (also taught by me)• first couple of weeks• introduces Prolog at a more gentle pace • uses lab classes for practice

• Lectures TR 3:30–4:45pm– Harvill 313

• Hands-on Lab Class: this Thursday– Social Sciences 224