Top Banner
Grammars
33

Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

Dec 17, 2015

Download

Documents

Drusilla Pitts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

Grammars

Page 2: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Languages

• Natural languages– language spoken or written by humans

• English, French, Italian, Spanish, …– English and other natural languages clearly have

structure • subjects, nouns, verbs, …

• In contrast we have programming languages– languages used to communicate with computers

• Java, C, C++, …– programming languages also have structure

• expressions, statements, methods, …

Page 3: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Grammars

• A formal definition of the syntactic structure of a language

• Set of rules that tell you whether a sentence is correctly structured

• A sentence is a well-formed string in the language• Production rules specify the order of components and

their sub-components in a sentence• Top-level rule

– one production rule designated as the “start rule”– provides the structure for an entire sentence

Page 4: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Production rules

• Production rules specify a syntactic category and assign to it a sequence of zero or more symbols

• Symbols are either terminal or non-terminal• Terminal symbols

– correspond to components of the sentence with no internal syntactic structure

• Non-terminal symbols – any symbol assigned values by a production rule

Page 5: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Sentence parsing and generation

• A grammar can be used to parse a sentence or to generate one

• Parsing begins with a sentence and ends with the top level rule– at lowest level sentence is composed of terminal

symbols, first assign a terminal syntactic category to each component

– assign non-terminal symbols to each appropriate group of terminals, up to the level of entire sentence

• Generation starts from the top-level rule and chooses one alternative production wherever there is a choice

Page 6: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Example

• The English grammar– a set of rules for combining words into well-formed

phrases and clauses

Page 7: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Clauses

• A clause is a group of related words containing a subject and a verb – building blocks of sentences

• Independent clause– can stand alone as complete sentence

Tammy is great!• Dependent clause

– unable to stand alone as complete sentenceAlthough Tammy is great

– “Although” is a dependent word: allows clause to be embedded in another sentenceAlthough Tammy is great, this class is still boring.

Page 8: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Phrases

• A phrase is a group of related words that does not contain a subject-verb relationship– can consist of a single word or a group of words

• Sentences formed by noun-phrases and verb-phrases• Noun-phrases can be of the form

– noun– article noun– article adjective noun

• Examples:Dogs.Some dogs.The mean dogs.

Page 9: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Phrases

• Verb-phrases can be of the form– verb– verb adverb

• Examples

bite.

drool profusely• A noun-phrase can be contained in a verb-phrase

bite people– the noun people is the object of the verb bite

Page 10: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

A simple English grammar

• Consider independent clauses only• Non-terminal symbols are indicated by angle brackets• Start rule

<SENTENCE> => <NOUN-PHRASE><VERB-PHRASE>• Production rules

<NOUN-PHRASE> => <NOUN>|<ARTICLE><NOUN>| <ARTICLE><ADJECTIVE><NOUN>

<VERB-PHRASE> => <VERB>|<VERB><NOUN-PHRASE>• These rules specify all non-terminal symbols

– we haven’t specified the terminals yet

Page 11: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

An English grammar

• The above grammar specifies the rules for generating a syntactically correct English sentence

• This means …– given lists of nouns, articles, adjectives, and verbs,

we can generate any number of syntactically correct English sentences

– these lists will specify the terminal symbols

<SENTENCE> => <NOUN-PHRASE> <VERB-PHRASE>

<NOUN-PHRASE> => <NOUN> | <ARTICLE> <NOUN> |

<ARTICLE> <ADJECTIVE> <NOUN>

<VERB-PHRASE> => <VERB> | <VERB> <NOUN-PHRASE>

Page 12: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Terminal symbols

• Still need to specify terminal symbols for our grammar• In particular, for the non-terminal symbols

– <NOUN>– <VERB>– <ARTICLE>– <ADJECTIVE>

• Let’s assign them as follows:

<NOUN> => DOG | CAT | WATER<VERB> => BIT | SNIFFED | DRANK | SCRATCHED<ARTICLE> => THE | A<ADJECTIVE> => STINKY | HAPPY | MEAN

Page 13: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Sentence generation

• According to our grammar, the following sentences are syntactically correct:

DOG BIT

DOG BIT CAT

THE DOG BIT THE CAT

THE CAT BIT THE DOG

THE HAPPY DOG SNIFFED THE STINKY CAT

A MEAN CAT SCRATCHED THE DOG

A STINKY DOG DRANK THE WATER

THE HAPPY WATER DRANK THE STINKY DOG

THE MEAN WATER SCRATCHED THE HAPPY CAT

Page 14: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Generation

THE HAPPY DOG SNIFFED THE STINKY CAT

<SENTENCE>

<NOUN-PHRASE> <VERB-PHRASE>

<ARTICLE> <ADJECTIVE> <NOUN> <VERB> <NOUN-PHRASE>

<ARTICLE> <ADJECTIVE> <NOUN>

THE HAPPY DOG SNIFFED THE STINKY CAT

Page 15: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Generation

THE HAPPY WATER DRANK THE STINKY DOG

<SENTENCE>

<NOUN-PHRASE> <VERB-PHRASE>

<ARTICLE> <ADJECTIVE> <NOUN> <VERB> <NOUN-PHRASE>

<ARTICLE> <ADJECTIVE> <NOUN>

THE HAPPY WATER DRANK THE STINKY DOG

Page 16: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Generation

• Q: How many different sentences can be generated by this grammar?

• A: 3024 (27 possible noun-phrases multiplied by 112 possible verb-phrases)

<SENTENCE> => <NOUN-PHRASE> <VERB-PHRASE>

<NOUN-PHRASE> => <NOUN> | <ARTICLE> <NOUN> |

<ARTICLE> <ADJECTIVE> <NOUN>

<VERB-PHRASE> => <VERB> | <VERB> <NOUN-PHRASE><NOUN> => DOG | CAT | WATER<VERB> => BIT | SNIFFED | DRANK | SCRATCHED<ARTICLE> => THE | A<ADJECTIVE> => STINKY | HAPPY | MEAN

Page 17: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Parsing

• Process of taking a sentence and fitting it to a grammar

• Parsing English is complex due to context dependence• Natural language understanding is one of the hardest

problems of artificial intelligence– human language is complex, irregular and diverse– philosophical problems of meaning

DOG BIT CAT<NOUN> <VERB> <NOUN><NOUN-PHRASE> <VERB-PHRASE><SENTENCE>

Page 18: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Recognition by parsing

• Grammars are used to recognize syntactically correct sentences

• Example

Fit the above sentence to the given grammar:

THE HAPPY DOG SNIFFED THE STINKY CAT

<NOUN> => DOG | CAT | WATER<VERB> => BIT | SNIFFED | DRANK | SCRATCHED<ARTICLE> => THE | A<ADJECTIVE> => STINKY | HAPPY | MEAN

Page 19: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Parsing example

THE HAPPY DOG SNIFFED THE STINKY CAT

<ARTICLE> <ADJECTIVE> <NOUN> <VERB> <ARTICLE> <ADJECTIVE> <NOUN>

<NOUN-PHRASE> <VERB> <NOUN-PHRASE>

<NOUN-PHRASE> <VERB-PHRASE>

<SENTENCE>

Page 20: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Optional parts of speech

• The given grammar provided options for phrases

<SENTENCE>

<NOUN-PHRASE> <VERB-PHRASE>

<VERB> <NOUN-PHRASE>

<ARTICLE> <NOUN>

THE DOG SNIFFED THE CAT

<ARTICLE> <NOUN>

<SENTENCE> => THE DOG SNIFFED THE CAT

Page 21: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Syntax and semantics

• Syntax only tells you if the sentence is constructed correctly

• Semantics tells you whether a correctly structured sentence makes any sense

• The sentences

are correct syntactically but something appears wrong …• WATER usually isn’t happy or mean and usually doesn’t

drink or scratch either

THE HAPPY WATER DRANK THE STINKY DOG

THE MEAN WATER SCRATCHED THE HAPPY CAT

Page 22: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Formal specifications

• Need a precise notation of syntax of a language– grammars can be used for generation and parsing

• Context-free grammars

• Substitute as many times as necessary• All legal statements can be generated this way

<name> => sequence of letters and/or

digits that begins with a letter

<name> => gikB

<name> => msg42

Page 23: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Context-free grammar

• How do we get this from our grammar?

• Unlike natural languages such as English, all the legal strings in a programming language can be specified using a context-free grammar

person = firstname + " " + lastname;

Page 24: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Recursive Sentence Generator (RSG)

• Constructs sentences, paragraphs, and even papers that fit a prescribed format

• RSG demo applet, courtesy of Prof. Forbes, Duke CS

http://www.duke.edu/web/cps001/code/RSG.html• The format is specified by a user-defined grammar• You will define your own grammars in Lab 9• Some example grammars are here:

http://www.duke.edu/web/cps001/code/grammars/• We will go over the poem grammar (Poem.g)

Page 25: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

RSG syntax

• Production rules enclosed in curly braces {}• Nonterminals are enclosed in angle brackets• Terminals are plain text• First line of production rule specifies syntactic category• All lines that follow specify options for that category

– options are separated by semicolons• Must specify top-level rule with nonterminal <start>

{ <start> your top level rule here. can be as many sentences as you like. ;}

Page 26: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Poem.g

{ <start> The <object> <verb> tonight. ;}{ <object> waves ; big yellow flowers ; slugs ;}

{

<verb>

sigh <adverb> ;

portend like <object> ;

}

{

<adverb>

warily ;

grumpily and <adverb> ;

}

Page 27: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Poem.g

• Top-level rule specifies one sentence with 2 nonterminals

• Nonterminal <object> provides three options, all terminal

{ <object> waves ; big yellow flowers ; slugs ;}

{ <start> The <object> <verb> tonight. ;}

Page 28: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Poem.g

• Nonterminals can refer to other nonterminals and be combinations of terminals and nonterminals

• Nonterminal <verb> refers to the nonterminals <adverb> and <object>

• Nonterminal <object> is already defined• Need to define <adverb>

{

<verb>

sigh <adverb> ;

portend like <object> ;

}

Page 29: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Poem.g

• Nonterminals can refer to themselves• Nonterminal <adverb> has two options

– first is terminal– second refers to <adverb>

• What would happen if there was no terminal option?

{

<adverb>

warily ;

grumpily and <adverb> ;

}

Page 30: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Generating a poem

<start>• All sentences start with <start>• Only one production in the definition of <start> The <object> <verb> tonight. • Expand each grammar element from left to right

– The is a terminal, so it is simply printed – <object> is a non-terminal, so it must be expanded

Choose one:•waves•big yellow flowers •slugs

Suppose slugs is chosen

Page 31: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

Generating a poem

The slugs <verb> tonight. – <verb> is a non-terminal, so it must be expanded

Choose one: •sigh <adverb>•portend like <object>

Suppose sigh <adverb> is chosen The slugs sigh <adverb> tonight.

– <adverb> is a non-terminal, so it must be expandedChoose one: •warily•grumpily

Suppose warily is chosen

Page 32: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

A complete poem

The slugs sigh warily tonight.

– The terminal tonight. is simply printed – There are no more non-terminals to expand!– The grammar has generated a complete poem

Question:– Why is this called a recursive sentence generator?

Page 33: Grammars. Languages Natural languages –language spoken or written by humans English, French, Italian, Spanish, … –English and other natural languages.

                                                                                                                                                                            

More poems

• Go to the RSG demo applet and select the poem grammar to generate more poems

http://www.duke.edu/web/cps001/code/RSG.html• How many different poems are possible?