Top Banner
1 Inteligenta Inteligenta Artificiala Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea http://turing.cs.pub.ro/ ia_2005
55

1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

Dec 27, 2015

Download

Documents

Britney White
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

1

Inteligenta ArtificialaInteligenta Artificiala

Universitatea Politehnica BucurestiAnul universitar 2003-2004

Adina Magda Florea

http://turing.cs.pub.ro/ia_2005

Page 2: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

2

Curs nr. 12

Prelucrarea limbajului natural

(Natural Language Processing)

2

Page 3: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

3

Defining Languages with Defining Languages with Backus-Naur Form (BNF)Backus-Naur Form (BNF)

A formal language is defined as a set of strings, where each string is a sequence of symbols

All the languages consist of an infinite set of strings need a concise way to characterize the set use a grammar

Terminal Symbols – Symbols or words that make up the strings of the languageExample– Set of symbols for the language of simple arithmetic

expressions– {0,1,2,3,4,5,6,7,8,9,+,-,*,/,(,)}

Page 4: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

4

Components in a BNF Grammar Components in a BNF Grammar

Nonterminal Symbols– Categorize subphrases of the language

Example– The nonterminal symbol NP (NounPhrase)

denotes an infinite set of strings, including “you” and “the big dog”

Page 5: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

5

Components in a BNF GrammarComponents in a BNF Grammar

Start Symbol– Nonterminal symbol that denotes the complete

strings of the language

Set of rewrite rules or productions– LHS RHS– LHS is a nonterminal– RHS is a sequence of zero or more symbols

(either terminal or nonterminal)

Page 6: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

6

Example: BNF Grammar for Simple Arithmetic Expressions

Exp Exp Operator Exp | (Exp) | Number

Number Digit | Number Digit

Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Operator + | - | * | /

Page 7: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

7

The Component Steps of The Component Steps of CommunicationCommunication

A typical communication, in which the speaker S wants to transmit the proposition P to the hearer H using words W, is composed of 7 processes.

3 take place in the speaker

4 take place in the hearer

Page 8: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

8

Processes in the SpeakerProcesses in the Speaker

Intention– S wants H to believe P (where S typically

believes P) Generation

– S chooses the words W (because they express the meaning P)

Synthesis – S tells the words W (usually addressing them to

H)

Page 9: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

9

Processes in the HearerProcesses in the Hearer

Perception– H perceives W’ (ideally W’ = W, but

misperception is possible)

Analysis – H infers that W’ has possible meanings P1,

…,Pn (words and phrases can have several meanings)

Page 10: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

10

Processes in the Hearer Processes in the Hearer

Disambiguation– H infers that S intended to express Pi

(where ideally Pi = P, but misinterpretation is possible)

Incorporation– H decides to believe Pi (or rejects it if it is

out of line with what H already believes)

Page 11: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

11

ObservationsObservations

If the perception refers to spoken expressions, this is speech recognition

If the perception refers to hand written expressions, this is recognition of hand writing

Neural networks have been successfully used to both speech recognition and to hand writing recognition

Page 12: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

12

Observations Observations

The analysis, disambiguation and incorporation form natural language understanding are relying on the assumption that the words of the sentence are known

Many times, recognition of individual words may be driven by the sentence structure, so perception and analysis interact, as well as analysis, disambiguation, and incorporation

Page 13: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

13

Defining a GrammarDefining a Grammar

Lexicon - list of allowable vocabulary words, grouped in categories (parts of speech):– open classes - words are added to the

category all the time (natural language is dynamic, it constantly evolves)

– closed classes - small number of words, generally it is not expected that other words will be added

Page 14: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

14

Example - A Small Lexicon

Noun stench | breeze | wumpus ..Verb is | see | smell ..Adjective right | left | smelly …Adverb here | there | ahead …Pronoun me | you | I | itRelPronoun that | whoName John | Mary Article the | a | an Preposition to | in | on Conjunction and | or | but

Page 15: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

15

The Grammar Associated to the The Grammar Associated to the LexiconLexicon

Combine the words into phrases Use nonterminal symbols to define

different kinds of phrases– sentence S– noun phrase NP– verb phrase VP– prepositional phrase PP– relative clause RelClause

Page 16: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

16

Example - The Grammar Associated to the Lexicon

S NP VP | S Conjunction SNP Pronoun | Noun | Article Noun |

NP PP | NP RelClauseVP Verb | VP NP | VP Adjective |

VP PP | VP AdverbPP Preposition NPRelClause RelPronoun VP

Page 17: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

17

Syntactic Analysis (Parsing)Syntactic Analysis (Parsing)

Parsing is the problem of constructing a derivation tree for an input string from a formal definition of a grammar.

Parsing algorithms may be divided into two classes:– top-down parsing– bottom-up parsing

Page 18: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

18

Top-Down ParsingTop-Down Parsing

Start with the top-level sentence symbol and attempt to build a tree whose leaves match the target sentence's words (the terminals)

Better if many alternative terminal symbols for each word

Worse if many alternative rules for a phrase

Page 19: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

19

Example for Top-Down Parsing

"John hit the ball" 1. S 2. S NP, VP 3. S Noun, VP 4. S John, Verb, NP 5. S John, hit, NP 6. S John, hit, Article, Noun 7. S John, hit, the, Noun 8. S John, hit, the, ball

Page 20: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

20

Bottom-Up ParsingBottom-Up Parsing

Start with the words in the sentence (the terminals) and attempt to find a series of reductions that yield the sentence symbol

Better if many alternative rules for a phrase

Worse if many alternative terminal symbols for each word

Page 21: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

21

Example for Bottom-Up Parsing

1. John, hit, the, ball 2. Noun, hit, the, ball 3. Noun, Verb, the, ball 4. Noun, Verb, Article, ball 5. Noun, Verb, Article, Noun 6. NP, Verb, Article, Noun 7. NP, Verb, NP 8. NP, VP 9. S

Page 22: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

22

Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)

Problems with BNF Grammar– BNF only talks about strings, not meanings– Want to describe context-sensitive

grammars, but BNF is context-free Introduce a formalism that can handle

both of these problems Use the first-order logic to talk about

strings and their meanings

Page 23: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

23

Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)

We are interested in using language for communication need some way of associating a meaning with each string

Each nonterminal symbol becomes a one-place predicate that is true of strings that are phrases of that category

Example– Noun(“ball”) is a true logical sentence– Noun(“the”) is a false logical sentence

Page 24: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

24

Definite Clause Grammar (DCG)Definite Clause Grammar (DCG)

A definite clause grammar (DCG) is a grammar in which every sentence must be a definite clause.

A definite clause is a type of Horn clause that, when written as an implication, has exactly one atom in the conclusion and a conjunction of zero or more atoms in the hypothesis, for example A1 A2 … C1

Page 25: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

25

Example 1

In BNF notation, we have: S NP VP

In First-Order Logic notation, we have:NP(s1) VP(s2) S(Append(s1, s2))

We read: If there is a string s1 that is a noun phrase and a string s2 that is a verb phrase, then the string formed by appending them together is a sentence

Page 26: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

26

Example 2

In BNF notation, we have: Noun ball | book

In First-Order Logic notation, we have:(s = “ball” s = “book”) Noun(s)

We read: If s is the string “ball” or the string “book”, then the string s is a noun

Page 27: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

27

Rules to Translate BNF in DCGRules to Translate BNF in DCG

BNF DCG

X Y Z Y(s1) Z(s2) X(Append(s1,s2))

X word X(["word"])

X Y | Z Y(s) X(s) Z(s) X(s)

Page 28: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

28

Augmenting the DCGAugmenting the DCG

Extend the notation to incorporate grammars that can not be expressed in BNF

Nonterminal symbols can be augmented with extra arguments

Page 29: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

29

Augmenting the DCG Augmenting the DCG Add one argument for semanticsAdd one argument for semantics

In DCG, the nonterminal NP translates as a one-place predicate where the single argument is a string: NP(s)

In the augmented DCG, we can write NP(sem) to express “an NP with semantics sem”. This gets translated into logic as the two-place predicate NP(sem, s)

Page 30: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

30

Augmenting the DCG Augmenting the DCG Add one argument for semanticsAdd one argument for semantics

DCG FOPL PROLOG

S(sem) NP(sem1) VP(sem2) {compose(sem1, sem2, sem)}

NP(s1, sem1) VP(s2, sem2) S(append(s1, s2)), compose(sem1, sem2, sem)

See later on

Page 31: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

31

Semantic InterpretationSemantic Interpretation

Compositional semantics - the semantics of any phrase is a function of the semantics of its subphrases; it does not depend on any other phrase before, after, or encompassing the given phrase

But natural languages does not have a compositional semantics for the general case.

Page 32: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

sentence(S, Sem) :- np(S1, Sem1), vp(S2, Sem2), append(S1, S2, S), Sem = [Sem1 | Sem2].

np([S1, S2], Sem) :- article(S1), noun(S2, Sem).

vp([S], Sem) :- verb(S, Sem1), Sem = [property, Sem1].

vp([S1, S2], Sem) :- verb(S1), adjective(S2, color, Sem1),Sem = [color, Sem1].

vp([S1, S2], Sem) :- verb(S1), noun(S2, Sem1), Sem = [parts, Sem1].

Page 33: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

33

Problems with Augmented DCGProblems with Augmented DCG

The previous grammar will generate sentences that are not grammatically correct

NL is not a context free language Must deal with

– cases– agreement between subject and main verb in the

sentence (predicate)– verb subcategorization: the complements that a

verb can accept

Page 34: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

34

SolutionSolution

Augment the existing rules of the grammar to deal with context issues

Start by parameterizing the categories NP and Pronoun so that they take a parameter indicating their case

Page 35: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

CASESNominative case (subjective case) + agreementI take the bus Je prends l’autobus Eu iau autobuzulYou take the bus Tu prends l’autobus Tu iei autobuzulHe takes the bus Il prend l’autobus El ia autobuzul Accusative case (objective case)He gives me the book Il me donne le livre El imi da cartea 

 Dative case

You are talking to me Il parle avec moi El vorbeste cu mine 

Page 36: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

36

Example - The Grammar Using Augmentations to Represent Noun Cases

S NP(Subjective) VPNP(case) Pronoun (case) | Noun | Article NounPronoun(Subjective) I | you | he | shePronoun(Objective) me | you | him | her 

Page 37: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

37

sentence(S) :- np(S1,subjective), vp(S2),append(S1, S2, S).

np([S], Case) :- pronoun(S, Case).np([S], _ ) :- noun(S).np([S1, S2], _ ) :- article(S1), noun(S2).pronoun(i, subjective).pronoun(you, _ ).pronoun(he, subjective).pronoun(she, subjective).pronoun(me, objective).pronoun(him, objective).pronoun(her, objective).

Page 38: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

38

Verb SubcategorizationVerb Subcategorization

Augment the DCG with a new parameter to describe the verb subcategorization

The grammar must state which verbs can be followed by which other categories. This is the subcategorization information for the verb

Each verb has a list of complements

Page 39: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

39

Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar

A subcategorization list is a list of complement categories that the verb accepts

Augment the category VP to take a subcategorization argument that indicates the complements that are needed to form a complete VP

Page 40: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

40

Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar

Change the rule for S to say that it requires a verb phrase that has all its complements, and thus a subcategorization list of [ ]

Rule S NP(Subjective) VP([ ])– The rule can be read as “A sentence can

be composed of a NP in the subjective case, followed by a VP which has a null subcategorization list “

Page 41: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

41

Integrate Verb Subcategorization Integrate Verb Subcategorization into the Grammarinto the Grammar

– Verb phrases can take adjuncts, which are phrases that are not licensed by the individual verb, but rather may appear in any verb phrase

– Phrases representing time and place are adjuncts, because almost any action or event can have a time or a place

VP(subcat) VP(subcat) PP| VP(subcat) Adverb

I smell the wumpus now–

Page 42: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

42

VP(subcat) VP([NP | subcat]) NP(Objective)| VP([Adjective | subcat]) Adjective| VP ([PP | subcat]) PP| Verb(subcat)| VP(subcat) PP| VP(subcat) Adverb

The first line can be read as “A VP, with a given subcategorization list, subcat, can be formed by a VP followed by a NP in the objective case, as long as that VP has a subcategorization list that starts with the symbol NP and is followed by the elements of the list subcat ”

Page 43: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

43

give [NP, PP] give the gold in box to me[NP, NP] give me the gold

smell [NP] smell a wumpus[Adjective] smell awfull[PP] smell like a wumpus

is [Adjective] is smelly[PP] is in box[NP] is a pit

died [] died

believe [S] believe the wumpus is dead

Page 44: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

VP(subcat) VP([NP | subcat]) NP(Objective)| VP([Adjective | subcat]) Adjective| VP ([PP | subcat]) PP| Verb(subcat)| VP(subcat) PP| VP(subcat) Adverb

vp(S, [np | Subcat]) :- vp(S1, [np | Subcat]), np(S2, objective),

append(S1, S2, S).

vp(give, [np, pp]).vp(give, [np, np]). vp(smell, [np]).vp(smell,[adjective]).vp(smell,[pp]).

Page 45: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

But dangerous to translateVP(subcat) VP(subcat) PP

Solutionvp(S, Subcat) :- vp1(S1, Subcat), pp(S2), append(S1, S2, S).

Page 46: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

46

Generative Capacity of Generative Capacity of Augmented GrammarsAugmented Grammars

The generative capacity of augmented grammars depends on the number of values for the augmentations

If there is a finite number, then the augmented grammar is equivalent to a context-free grammar

Page 47: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

47

Semantic InterpretationSemantic Interpretation

The semantic interpretation is responsible for getting all possible interpretations, and disambiguation is responsible for choosing the best one.

Disambiguation is done starting from the pragmatic interpretation of the sentence.

Page 48: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

48

Pragmatic InterpretationPragmatic Interpretation

Complete the semantic interpretation by adding information about the current situation

Pragmatics shows how the language is used and its effects on the listener

Pragmatics will tell why it is not appropriate to answer "Yes" to the question "Do you know what time it is?"

Page 49: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

49

IndexicalsIndexicals

Indexical - phrase that refer directly to the current situation

Example– I am in Bucharest today.

Page 50: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

50

AnaphoraAnaphora

Anaphora - the occurrence of phrases referring to objects that have been mentioned previously

Example

– John was hungry. He entered a restaurant.

– The ball hit the house. It broke the window.

Page 51: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

51

AmbiguityAmbiguity

Lexical Ambiguity Syntactic Ambiguity Referential Ambiguity Pragmatic Ambiguity

Page 52: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

52

Lexical AmbiguityLexical Ambiguity

A word has more than one meaning Examples

– A clear sky– A clear profit– The way is clear– John is clear– It is clear that ...

Page 53: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

53

Syntactic AmbiguitySyntactic Ambiguity

Can occur with or without lexical ambiguity

Examples– I saw the Statue of Liberty flying over New

York.– I saw John in a restaurant with a telescope.

Page 54: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

54

Referential AmbiguityReferential Ambiguity

Occurs because natural languages consist almost entirely of words for categories, not for individual objects

Example– John met Mary and Tom. They went to a

restaurant.– Block A is on block B and it is not clear.

Page 55: 1 Inteligenta Artificiala Universitatea Politehnica Bucuresti Anul universitar 2003-2004 Adina Magda Florea .

55

Pragmatic AmbiguityPragmatic Ambiguity

Occurs when the speaker and the hearer disagree on what the current situation is

Example– I will meet you tomorrow.