Top Banner
Computational Linguistics INTroduction Lecture 1 Computers and Language
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Linguistics INTroduction Lecture 1 Computers and Language.

ComputationalLinguistics INTroduction

Lecture 1

Computers and Language

Page 2: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 2

Course Information

Course Websitehttp://staff.um.edu.mt/mros1/lin2160

[email protected]@um.edu.mt

Book Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN 978-0-13-504196-3Natural Language Toolkit (NLTK)http://www.nltk.org/

Page 3: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 3

CL: Two Main Disciplines

COMP SCILINGUISTICS language and computers

Page 4: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 4

Language and Computers includes …

Natural Language Processing (NLP) Computational models of language analysis, interpretation,

and generation. syntax/semantics interface

Human Language Technology emphasis on large-scale performance example1: Google search example2: speech technology

Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts

Page 5: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 5

Linguistics

Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use

Page 6: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 6

Noam Chomsky

Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.

Chomsky has been the dominant figure in linguistics ever since.

Chomsky invented the generative approach to grammar.

Page 7: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 7

Generative Grammar:Some Key Points

Theory of grammar includes mathematical definition of what a grammar is.

A language is a (possibly infinite) set of sentences.

But a grammar is finite. Grammar generates all and only sentences

of a language. Undergeneration Overgeneration

[source: Sag & Wasow]

Page 8: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 8

Generative Power of a Grammar

G

G

GL

L

L

undergenerationonly but not all

overgenerationall but not only

all and only

Page 9: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 9

Formal Grammar

Grammar is a set of rewrite rules Rules have the form

LHS RHS LHS can be rewritten as RHS LHS & RHS are sequences made of words or

symbols Lexicon specifies words and their categories

Category word Category can be rewritten as word

Page 10: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 10

A Simple Grammar/Lexicon

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Page 11: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 11

Formal v. Natural Languages

Formal Languages

Arithmetic3290 1 1010101

Logicx man(x) mortal(x)

URLhttp://www.cs.um.edu.mt

Natural Languages

EnglishJohn saw the dog

GermanJohann hat den hund gesehen

MalteseĠianni ra kelb

Page 12: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 12

Some Points of Similarity

Sentences are sequences of words (or symbols).

Rules determine which sequences are valid sentences.

Sentences have a definite structure. Sentence structure systematically related to

meaning.

Page 13: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 13

Structure Affects Meaning

I shot an elephant in my trousers

Page 14: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 14

Points of Difference

Formal Languages The grammar

defines the language

Restricted application

Non ambiguous

Natural Languages The language

defines the grammar

Universal application

Highly ambiguous

Page 15: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 15

Ambiguity Morphological Ambiguity

en-large-ment Lexical Ambiguity

Iraqi Head Seeks Arms Syntactic Ambiguity

small animals and children laugh Semantic Ambiguity

every girl loves a sailor Pragmatic Ambiguity

can you pass the salt? The management of ambiguity is central to the

success of CL

Page 16: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 16

I made her duck

I cooked a duck for her I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower her head I turned her into a duck

Page 17: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 17

Computer Science

The study of basic concepts Information Data Algorithm Program

The application of these concepts to practical tasks.

Implementation of computational models from other fields (meteorology,..,linguistics)

Page 18: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 18

Information Data Algorithm Program Information is a theoretical concept invented by Shannon in 1948

to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits

1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else).

When I tell you that I have tea, I have conveyed one bit of information.

The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome.

Page 19: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 19

Information DataAlgorithm Program

A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means.

Example: a telephone directory Unlike information, which is abstract, data is

concrete Data has a certain level of structure. In the

telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number.

Page 20: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 20

Information Data Algorithm Program

A completely defined procedure for the solution of a given problem in a finite number of steps

Designed for a well-defined task. Finite description length. Guaranteed to terminate. Abstract

Page 21: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 21

Algorithm for Chocolate Cake

Page 22: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 22

Program to Add X and Y

subtract 1 from X

add 1 to Y

X = 0?

Read X and YX = 2, Y = 3

yesno Output Y

Page 23: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 23

Computer Program

A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem.

Concrete A program can implement an algorithm. More than one program may implement the

same algorithm. Not all programs express good algorithms!

Page 24: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 24

Instructions vs. Execution Steps

1. Read X

2. Read Y

3. X = X-1

4. Y = Y+1

5. If X = 0 then Print(X) else goto 3

How many instructions?

How many execution steps?

Page 25: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 25

Algorithms and Linguistics

Do linguistic theories in the abstract make sense?

Linguistic theory explain linguistic knowledge in the form of grammar rules theories about grammar rules

But performance, involves processing issues:

Page 26: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 26

Computational Linguistics – Issues

How are a grammar and a lexicon represented? How is the structure of a given sentence actually

discovered? How can we actually generate a sentence to

express a particular intended meaning? How can linguistic theory be made concrete enough

to test algorithmically? Can an artificial system learn a language with

limited exposure to grammatical sentences?

Page 27: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 27

Computers and LanguageTwin Goals

Scientific Goal:Contribute to Linguistics by adding a computational dimension.

Technological Goal: Develop machinery capable of handling human language that can support “language engineering”

Page 28: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 28

Computers and Language Tools & Resources

Grammar Formalisms, e.g.Definite Clause Grammars

Parsing Algorithmssentence structure

Generation Algorithmsstructure sentence

Statistical Methods Linguistic Corpora

Page 29: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 29

Computers and Language: Applications

Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation

Page 30: Computational Linguistics INTroduction Lecture 1 Computers and Language.

Feb 2010 -- MR CLINT - Lecture 1 30

LECTURES

1 Overview

2 Chomsky Hierarchy

3 Chomsky Hierarchy

4 Chomsky Hierarchy

5 Computational Syntax

6 Agreement & Subcategorisation

7 Computational Syntax

8 Computational Syntax

9 Corpora, Tools and Techniques

10 Morphology

11 Computational Morphology

12 Computational Morphology

13 Computational Morphology

14 Revision