Top Banner
Formal Languages Formal Grammars Regular Languages Formal complexity of Natural Languages References Formal Languages applied to Linguistics Pascal Amsili Laboratoire de Linguistique Formelle, Université Paris Diderot U. São Carlos, september 2014 1 / 113
48

Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Jun 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

Formal Languages applied to Linguistics

Pascal Amsili

Laboratoire de Linguistique Formelle, Université Paris Diderot

U. São Carlos, september 2014

1 / 113

Page 2: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Overview

1 Formal Languages

2 Formal Grammars

3 Regular Languages

4 Formal complexity of Natural LanguagesIntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

83 / 113

Page 3: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Motivation

It gives us knowledge about the structure of naturallanguages,It helps us assess the adequation of linguistic formalisms,It gives bound for the complexity of NLP tasks,It provides us with predictions about human languageprocessing.

84 / 113

Page 4: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Hypotheses

We can talk about “natural language” in general: all languageshave a similar structure, a similar powerNatural languages are recursively enumerable, i.e. they areformal languagesNatural languages are infinite

) Under those hypotheses, it is possible to ask the question:what is the complexity of natural languages ?

85 / 113

Page 5: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A stranger arrived.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5)

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 6: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A tall stranger arrived.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5)

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 7: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A tall handsome stranger arrived.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5)

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 8: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A dark tall handsome stranger arrived.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5)

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 9: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A dark tall handsome stranger arrived suddenly.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5)

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 10: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A dark tall handsome stranger arrived suddenly.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5) The cats hunt.

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 11: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A dark tall handsome stranger arrived suddenly.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5) The cats the neighbor owns hunt.

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 12: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences

1 Arbitrary long sentences can be built by adding new material:

(4) A dark tall handsome stranger arrived suddenly.

2 More interestingly, arbitrary long sentences can be builtthrough center-embedding. In this case, there is a dependancybetween arbitrary far apart elements:

(5) The cats the neighbor who arrived owns hunt.

center-embedding : embedding a phrase in the middle ofanother phrase of the same type

86 / 113

Page 13: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

An infinite number of sentences (cont’d)

Consider the 3 structures:

If S1, then S2.Either S1 or S2.The man who said S1 is coming today.

1 The colored items are dependent one from the other2 It is possible to create nested sentences of arbitrary length:

(6) If either the man who said S

a

is coming today, or Sb

, thenS

c

.

) A look at various ways to form infinite sentences gives accessto complexity.

87 / 113

Page 14: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Overview

1 Formal Languages

2 Formal Grammars

3 Regular Languages

4 Formal complexity of Natural LanguagesIntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

88 / 113

Page 15: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Preliminaries: a word on lexicon

(7) A dark tall handsome stranger arrived suddently.

a stranger arrived suddenlytalldark

handsome

1

Let’s leave aside lexicon issues

89 / 113

Page 16: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Preliminaries: a word on lexicon

(7) A dark tall handsome stranger arrived suddently.

a stranger arrived suddenlytalldark

handsome

1

Let’s leave aside lexicon issues

89 / 113

Page 17: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Preliminaries: a word on lexicon

(7) A dark tall handsome stranger arrived suddently.

a stranger arrived suddenlytalldark

handsome

1

Let’s leave aside lexicon issues

D N V Adv

Adj

1

89 / 113

Page 18: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Chomsky’s first attempt

Consider the 3 structures:If S1, then S2.Either S1 or S2.The man who said S1 is coming today.

1 The colored items are dependent one from the other2 It is possible to create nested sentences of arbitrary length:

(8) If either the man who said S

a

is coming today, or Sb

, thenS

c

.

Since such sentences are instances of mirroring and since the mirrorlanguage is not regular, then English is not regular (Chomsky,1957, p. 22).Fallacious claim: a regular language may contain a non regular

sub-language

90 / 113

Page 19: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical argument I

Let’s consider the sentence(s):

(9) A man fired another man.

A man (that a man)2 (hired)2 fired another man.

The sentences (10) are all well-formed sentences (for any n).

(10) A man (that a man)n (hired)n fired another man.

91 / 113

Page 20: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical argument I

Let’s consider the sentence(s):

(9) A man that a man hired fired another man.

A man (that a man)2 (hired)2 fired another man.

The sentences (10) are all well-formed sentences (for any n).

(10) A man (that a man)n (hired)n fired another man.

91 / 113

Page 21: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical argument I

Let’s consider the sentence(s):

(9) A man that a man that a man hired hired fired another man.

A man (that a man)2 (hired)2 fired another man.

The sentences (10) are all well-formed sentences (for any n).

(10) A man (that a man)n (hired)n fired another man.

91 / 113

Page 22: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical argument I

Let’s consider the sentence(s):

(9) A man that a man that a man hired hired fired another man.A man (that a man)2 (hired)2 fired another man.

The sentences (10) are all well-formed sentences (for any n).

(10) A man (that a man)n (hired)n fired another man.

91 / 113

Page 23: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical argument I

Let’s consider the sentence(s):

(9) A man that a man that a man hired hired fired another man.A man (that a man)2 (hired)2 fired another man.

The sentences (10) are all well-formed sentences (for any n).

(10) A man (that a man)n (hired)n fired another man.

91 / 113

Page 24: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Classical Argument II

Let x = that a many = hiredw = a manv = fired another man

wx

⇤y

⇤v is regular

English \ wx

⇤y

⇤v = wx

n

y

n

v (10)If English is regular, then wx

n

y

n

v must be regular (for theintersection of two regular languages is regular)But wxnynv is not regular (pumping lemma).Contradiction ) English is not regular.

(Schieber, 1985)

92 / 113

Page 25: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Discussion

Counter arguments :

Natural languages are finiteproductivity doesn’t seem to be bounda list of all possible sentences, supposedly finite, is still toolong for a human to learn

People are bad at interpreting embedding: there might be alimit

there are indeed constraints on performance,but in writing, or with an appropriate intonation, there doesn’tseem to be a hard-wired limit

93 / 113

Page 26: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Examples

Bad examples :

(11) A girl that the man that the doctor knows like was fired.

Good examples:

(12) A foreman that an employee who were recently hiredtalked with was fired.

94 / 113

Page 27: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Overview

1 Formal Languages

2 Formal Grammars

3 Regular Languages

4 Formal complexity of Natural LanguagesIntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

95 / 113

Page 28: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping lemma: intuition

1 If a word is long enough, then there is (at least) one nonterminal symbol appearing several times in its derivation.

“long enough” ?S ! A B

A ! abaccabca

| abSba

B ! ccccc

Minimal length : 14:

S ! AB ! abaccabcaB ! abaccabcaccccc

96 / 113

Page 29: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping lemma: intuition

2 Let’s call this non terminal symbol A.

A

Au v

z

97 / 113

Page 30: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping lemma: intuition

2 Let’s call this non terminal symbol A.

z

A

Au v

u A v

97 / 113

Page 31: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping lemma: intuition

2 Let’s call this non terminal symbol A.

z

A

Au v

u A v

A

⇤�! uAv

A

⇤�! uAv

⇤�! uzv

A

⇤�! uAv

⇤�! uuAvv

⇤�! u . . . u| {z }n

z v . . . v| {z }n

97 / 113

Page 32: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping Lemma for CF languages

Def. 19 (Star lemma – CF languages)If L is context-free, there exists p 2 N such that:8w s.t. |w | > p,w can be factorized w = rstuv ,with: |su| > 1

|stu| 6 p

8i > 0, rs

i

tu

i

v 2 L

(Bar-Hillel et al. , 1961)

98 / 113

Page 33: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Pumping lemma: Consequences

The pumping lemma gives us a tool to prove that a language is not

context-free.L context-free ) pumping lemma (8i , rs i tuiv 2 L)pumping lemma 6) L context-free

to prove that L iscontext-free provide a type 2 grammarnot context-free show that the pumping lemma does not apply

99 / 113

Page 34: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Results: expressivity

well-parenthetized words (dyck’s language) is context-freeS ! (S)S | "a

n

b

n(n > 0) is a context-free languageS ! aSb | "ww

R ,w 2 ⌃⇤ (mirror language) is a context-free languageS ! aSa | bSb | "ww ,w 2 ⌃⇤ (copy language) is not context-freeproof: pumping lemmaa

n

b

n

c

n is not context-freeproof: pumping lemmaa

m

b

n

c

m

d

n is not context-freeproof: pumping lemmaxa

m

b

n

yc

m

d

n

z is not context-freeproof: pumping lemma

100 / 113

Page 35: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Closure properties I

• CF languages are closed under rational operations

union (gather all the rules, avoiding name conflicts, andadding a new start rule S ! S1|S2),product (S ! S1S2),and Kleene star (S ! S1S | ").

101 / 113

Page 36: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Closure properties II : intersection

• CF languages are not closed under intersection

Example

L1 = {aibic j |i , j � 0} is context-free: S ! XY

X ! aXb | "Y ! cY | "

L2 = {aibjc j |i , j � 0} is also context-free: S ! XY

X ! aX | "Y ! bYc | "

But L1 \ L2 = {anbncn |n � 0} is not contex-free.

102 / 113

Page 37: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Closure properties III: other results

CF languages are not closed under complement (since they arenot closed under intersection)CF languages are closed under intersection with a regularlanguagea sub-class of CF languages, deterministic CF languages areclosed for set complement, but not for union (one can easilydefine an intrinsequely non deterministic language as the unionof two “independant” languages)

103 / 113

Page 38: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Final argument I

After many attempts by various scholars, attempts which areseverely critized and ruined in (Gazdar & Pullum, 1985), Schieber(1985) came up with a widely accepted answer:

1 In swiss-german, subordinate clauses can have a structurewhere all NPs precede all Vs. (13)

(13) Jan

Jan

säit

said

das

that

mer

we

NP

NP

⇤es

the

huus

house

haend

have

wele

wanted

V

V

⇤aastrüche

paint

‘Jan said that we have wanted (that) V

⇤NP

⇤paint the house’

2 Among those subordinate clauses, those where all the dativeNPs precede all the accusative NPs are well-formed. (14)

(14) ......

dasthat

merwe

d’chindthe_children.acc

em HansHans.dat

esthe

huushouse.acc

haendhave

welewanted

laalet

hälfehelp

aastrüchepaint

‘... that we have wanted to let the children help Hans to paint the house’

104 / 113

Page 39: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Final argument II3 The number of verbes requiring a dative has to be equal to the

number of date NPs, the same for accusative.4 The number of verbs in a subordinate clause is limited only by

performance

Let R be the language:R = {Jan säit das mer (d’chind)h (em Hans)i es huus haend wele (laa)j (hälfe)k aastrüche,

i , j , k, h > 1}Then let L = Swiss-German \ R =

{Jan säit das mer (d’chind)m (em Hans)n es huus haend wele (laa)m (hälfe)n aastrüche, m, n > 1}

L is not context-free, whereas R is regular.

) Swiss-German is not context-free.105 / 113

Page 40: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Overview

1 Formal Languages

2 Formal Grammars

3 Regular Languages

4 Formal complexity of Natural LanguagesIntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

106 / 113

Page 41: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Current proposal

1 The context-sensitive class seems too big: for instance{a2i / i > 0} is context-sensitive.

2 Joshi (1985) proposed a subclass of type 1 languages, namelythe class of mildly context-sensitive languages (MCSL), thisclass has the following properties:

ww is MCSa

n

b

n

c

n is MCSa

n

b

n

c

n

d

n is MCSa

i

b

j

c

i

d

j is MCSa

n

b

n

c

n

d

n

e

n is not MCSwww is not MCSab

h

ab

i

ab

j

ab

k

ab

l , h > i > j > k > l > 1 is not MCSa

2i

is not MCS

Conjecture : NL 2 MCSL

107 / 113

Page 42: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Current proposal

1 The context-sensitive class seems too big: for instance{a2i / i > 0} is context-sensitive.

2 Joshi (1985) proposed a subclass of type 1 languages, namelythe class of mildly context-sensitive languages (MCSL), thisclass has the following properties:

ww is MCSa

n

b

n

c

n is MCSa

n

b

n

c

n

d

n is MCSa

i

b

j

c

i

d

j is MCSa

n

b

n

c

n

d

n

e

n is not MCSwww is not MCSab

h

ab

i

ab

j

ab

k

ab

l , h > i > j > k > l > 1 is not MCSa

2i

is not MCS

Conjecture : NL 2 MCSL107 / 113

Page 43: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

More about MCSL

Interesting properties of MCSL:

restricted growth: if L is MCS, there is k such that for allwords w 2 L, there is a word w

0 s.t. |w 0| 6 |w | + k

word problem for MCSL are of a polynomial complexity

These properties are arguably common with natural languages

The formalism introduced by Joshi, Tree Adjoining Grammars,defines the class of MCSL.

108 / 113

Page 44: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Overview

1 Formal Languages

2 Formal Grammars

3 Regular Languages

4 Formal complexity of Natural LanguagesIntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

109 / 113

Page 45: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Chomskyan syntax

CFG are not convenient, let alone sufficent, to express NLsyntax

Transformational grammar CFG component (X-bar), augmentedwith a transformational component (e.g.passivisation) ) high expressive power

Government and Binding CFG component (X-bar), supplementedwith movements occurring rather freely on trees, anda multi-layered system.

Minimalist program CFG component (X-bar), supplemented withvarious devices whose formal definition is oftenlacking.

110 / 113

Page 46: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Minimalist grammars (Stabler, 2011)

Computational perspectives 7

Here we use • to represent em, and � to represent im. Since these functionsapply unambiguously, the derived structure obtained at each internal nodeof this derivation tree is completely determined. So if C is the ‘start’ categoryof our example grammar G, then this derivation shows that the sentence whoMarie praises 2 L(G1). Notice that the derivation tree is not isomorphic to

the the derived tree, numbered 10 just above.Minimalist grammars (MGs), as defined here by (5), (6) and (8), have

been studied rather carefully. It has been demonstrated that the class oflanguages definable by minimalist grammars is exactly the class definableby multiple context free grammars (MCFGs), linear context free rewritesystems (LCFRSs), and other formalisms [62,64,66,41]. MGs contrast inthis respect with some other much more powerful grammatical formalisms(notably, the ‘Aspects’ grammar studied by Peters and Ritchie [76], andHPSG and LFG [5,46,101]):

Fin Reg CF MG non−RERec RECS

Aspects,HPSG,LFG

The MG definable languages include all the finite (Fin), regular (Reg), andcontext free languages (CF), and are properly included in the context sen-sitive (CS), recursive (Rec), and recursively enumerable languages (RE).Languages definable by tree adjoining grammar (TAG) and by a certaincategorial combinatory grammar (CCG) were shown by Vijay Shanker andWeir to be sandwiched inside the MG class [103].4 With all these results,

Theorem 1. CF� TAG � CCG � MCFG � LCFRS � MG �CS.

When two grammar formalisms are shown to be equivalent (�) in thesense that they define exactly the same languages, the equivalence is of-ten said to be ‘weak’ and possibly of little interest to linguists, since we areinterested in the structures humans recognize, not in arbitrary ways of defin-ing identical sets of strings. But the weak equivalence results of Theorem 1are interesting. For one thing, the equivalences are established by providingrecipes for translating one kind of grammar into another, and those recipesprovide insightful comparisons of the recursive mechanisms of the respectivegrammars. Furthermore, when a grammar formalism is shown equivalent to

111 / 113

Page 47: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

IntroductionAre NL regular?Are NL context-free?Are NL context-sensitive?Syntactic formalisms

Other formalisms

Lexical-Functionnal Grammar (Bresnan, 1982)Head-Driven Phrase Structure Grammar (Pollard & Sag, 1994)Tree-Adjoining Grammar (Joshi, 1985)Combinatorial Categorial Grammars (Steedman, 1988)Dependency Grammars (Tesnière, 1959)

More details (lots of details) on Markus Dickinson’s web page: he taught

a course at Indiana University entitled “Alternative Syntactic Theories”

and you can find very good slides on this page: http://cl.indiana.edu/~md7/10/614/

112 / 113

Page 48: Formal Languages applied to Linguisticsamsili/Ens15/pdf/...Natural languages are recursively enumerable, i.e. they are formal languages Natural languages are infinite) Under those

Formal LanguagesFormal Grammars

Regular LanguagesFormal complexity of Natural Languages

References

References IBar-Hillel, Yehoshua, Perles, Micha, & Shamir, Eliahu. 1961. On formal properties of simple phrase

structure grammars. STUF-Language Typology and Universals, 14(1-4), 143–172.Bresnan, Joan (ed). 1982. The Mental Representation of Grammatical Relations. MIT Press.Chomsky, Noam. 1957. Syntactic Structures. Den Haag: Mouton & Co.Gazdar, Gerald, & Pullum, Geoffrey K. 1985 (May). Computationally Relevant Properties of Natural

Languages and Their Grammars. Tech. rept. Center for the Study of Language and Information,Leland Stanford Junior University.

Joshi, Aravind K. 1985. Tree Adjoining Grammars: How Much Context-Sensitivity is Required toProvide Reasonable Structural Descriptions? Tech. rept. Department of Computer and InformationScience, University of Pennsylvania.

Langendoen, D Terence, & Postal, Paul Martin. 1984. The vastness of natural languages. BasilBlackwell Oxford.

Mannell, Robert. 1999. Infinite number of sentences. part of a set of class notes on the Internet.http://clas.mq.edu.au/speech/infinite_sentences/.

Pollard, Carl, & Sag, Ivan A. 1994. Head-Driven Phrase Structure Grammar. Stanford: CSLI.Schieber, Stuart M. 1985. Evidence against the Context-Freeness of Natural Language. Linguistics and

Philosophy, 8(3), 333–343.Stabler, Edward P. 2011. Computational perspectives on minimalism. Oxford handbook of linguistic

minimalism, 617–643.Steedman, Mark. 1988. Combinators and Grammars. Pages 417–442 of: Oehrle, Richard T., Bach,

Emmon, & Wheeler, Deirdre (eds), Categorical Grammars and Natural Language Structures, vol. 32.D. Reidel Publishing Co.

Tesnière, Lucien. 1959. Eléments de syntaxe structurale. Librairie C. Klincksieck.113 / 113