Collected Lecture Slides for COMP 202 Formal Methods of Computer Science Neil Leslie and Ray Nickson 11 October, 2005 This document was created by concatenating the lecture slides into a single sequence: one chapter per topic, and one section per titled slide. There is no additional material apart from what was distributed in lectures.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Collected Lecture Slides for COMP 202
Formal Methods of Computer Science
Neil Leslie and Ray Nickson11 October, 2005
This document was created by concatenating the lecture slides into a singlesequence: one chapter per topic, and one section per titled slide.
There is no additional material apart from what was distributed in lectures.
ii
Neil Leslie and Ray Nickson assert their moral rightto be identified as the authors of this work.
COMP202 introduces a selection of topics, focusing on the use of formal no-tations and formal models in the specification, design and analysis of programs,languages, and machines.
The focus is on language: syntax, semantics, and translation.
Covers fundamental aspects of Computer Science which have many impor-tant applications and are essential to advanced study in computer science.
LecturesMonday, Wednesday and Thursday, 4.10-5pm in Hunter 323.
3
4 CHAPTER 1. INTRODUCTION
TutorialsOne hour per week.
Text book“Introduction to Computer Theory” (2nd Edition), by Daniel Cohen, Pub-lished by Wiley in 1997 (available from the University bookshop, approx.cost $135).
1.3 AssessmentProblem sets (10%)
The basis for tutorial discussion, and containing questions to write up formarking.
Programming projects (20%)Due in weeks 6 and 11.
Test (15%)Two hours, on Thursday 1 September. Exact time to be confirmed.Covers material from the first half of the course.
Exam (55%)Three hours.
To pass COMP 202 you must achieve at least 40% in the finalexam, and gain a total mark of at least 50%.
1.4 Tutorial and Marking Groups
Tutorials start NEXT WEEK (on Tuesday 12 July).There will be five groups for tutorials and marking; we expect all tutorials
to be held in Cotton 245 (subject to confirmation).Group Time1 Tuesday 12-12.50pm2 Tuesday 2.10-3pm3 Thursday 2.10-3pm4 Friday 11-11.50am5 Friday 12-12.50pm DELETED
Please sign up for a group on the sheets posted outside Cotton 245.You need to sign up for a group even if you don’t intend to attend
tutorials, as these will also be your marking groups.
1.5 What is Computer Science?
Computer Science involves (amongst other things):
• Describing complex systems, structures and processes.
1.6. SOME POWERFUL IDEAS 5
• Reasoning about such descriptions, to establish desired properties.
• Animating/executing descriptions to obtain resulting behaviour.
• Transforming one description into another.
These are common threads in much of Computer Science.
Understanding them will form the main aim of this course.
1.6 Some Powerful Ideas
Computer Science has produced several powerful ideas to address these kindsof problems.
Languages and notations: programming languages, command languages, datadefinition languages, class diagrams, . . .
Mathematical models: graphs to model networks, trees to model programstructures, . . .
COMP202 concentrates on the idea of language.
We will study techniques for describing and reasoning about different lan-guages, from simple to complicated.
We will keep in mind the idea that understanding language and performing
computation are closely related activities.
1.7 Related Areas
The course material is drawn mainly from, and used in:
• Programming language design, definition and implementation.
• Software specification, construction and verification.
• Formal languages and automata theory.
It also draws on mathematics (especially algebra and discrete maths),logic, and linguistics.
It has applications in areas such as Problem solving, User interfacedesign, Networking protocols, Databases, Programming language de-sign, and indeed in most computer applications.
6 CHAPTER 1. INTRODUCTION
1.8 Lecture schedule
0. Problems and programs [Weeks 1–2]Describing problems, Describing algorithms and programs, Proving prop-erties of algorithms and programs.
1. Regular Languages [Weeks 3–6]Defining and recognising finite languages, Properties of strings and lan-guages, Regular expressions, Finite automata, Properties of regular lan-guages.Text book: Part I.
2. Context Free Languages [Weeks 7–10]Context free grammars, Push down automata, Parsing.Text book: Part II.
3. Computability Theory [Weeks 11-12]Recursively enumerable languages, Turing Machines, Computable func-tions.Text book: Part III.
1.9 Conclusion
• COMP 202 will look at various techniques for defining, understanding,and processing languages.
• It will be a mixture of theory and practice.
• Mastery of concepts and techniques will be assessed by tests and exams;problem sets (tutorials and assignments) will test your ability to exploremore deeply what you have learned; and projects will give you the oppor-tunity to apply knowledge in practice.
• The Course Requirements document provides definitive information, andhas links to various rules and policies about which you should be aware.READ IT!
Chapter 2
Problems and Algorithms
2.1 Problems and Algorithms
Recall that our focus in this course will be on languages.Before we start studying how to precisely define languages, let us agree on
some language and notation that we will use to talk about those definitions(metalanguage).
In particular, let us describe the (semiformal) languages that we will use to:
• define problems
• express algorithms
• prove properties.
2.2 Problems, Programs and Proofs
• How can we specify a problem (independently of its solution)?
• How shall we describe a solution to a problem?– program– machine
• What does it mean to say that a given program/machine “solves” a givenproblem?
• How can we convince ourselves (and others) that a given program/machine“solves” a given problem?
2.3 Describing problems
How can we describe a “problem” for which we wish to write a computer pro-gram?
7
8 CHAPTER 2. PROBLEMS AND ALGORITHMS
E.g. P1: add two natural numbers.
P2: sort a list of integers into ascending order.
P3: find the position of an integer x in a list l.
P4: what is the shortest route from Wellington to Auckland?
P5: is the text in file f a valid Java program?
P6: translate a C++ program into machine code.In each case, we describe a mapping from inputs to outputs.
+----------+
| |
Input ---->| P |----> Output
| |
+----------+
To be more precise, we need to specify:
1. How many inputs, and what kinds of values they can have.
2. How many outputs, and what kinds of values they can have.
3. Which of the possible inputs are actually allowed.
4. What output is required/acceptable for each allowable input.
1,2. To define number and kinds of inputs and outputs, we need to define types.
Need “basic” types, eg integer, character, Boolean.
Combine these to give tuples, sequences/lists, sets, ...
Use mathematical types, rather that arrays & records,
Input and output domains = signature.
3. To define what inputs are allowed, specify a subset of the input domainby placing a precondition on the input values.
4. To define the required/acceptable output, specify a mapping from inputsto outputs.
Maybe a function; in general, a relation.
It is formalized by giving a postcondition linking inputs and outputs.
2.4 Example: Comparing Strings
Consider the following simple problem:
Determine whether two given character strings are identical.
2.5. UNDERSTANDING THE PROBLEM 9
If you need such an operation in a program you’re writing, you might:
• Use a built-in operation (e.g. s.equal).
• Look for existing code, or a published algorithm.
• Design and code an algorithm from scratch.
Before doing any of these, you should make sure you know exactly whatproblem you’re trying to solve, by defining the signature, precondition, andpostcondition.
2.5 Understanding the ProblemDefine the interface via a signature:
input Strings s and toutput Boolean r
orEqual : String × String → Bool
Formalise the output constraint:
r ≡ s = twhere
s = t is defined by |s| = |t| and (∀i ∈ 1 .. |s|)s[i] = t[i]
Notation:
• s[i] is the ith element of s (starting from 0)
• |s| is the length of s
2.6 Designing an Algorithm
In designing an algorithm to determine whether two strings are identical, weneed to consider:
• What style of algorithm to write:
– Iterative:
while ... do...
– or recursive:
Equal(s, t)△
=...Equal(..., ...)...
10 CHAPTER 2. PROBLEMS AND ALGORITHMS
• Which condition to test first:
if |s| = |t| thenwhile ... do
...else
...
for each index position i in s doif s[i] = t[i] then
...else
......
• What operations to use in accessing the strings.
– Indexing/length
s[i] Return ith element of s (starting from 0).|s| Return length of s.
– Head/tail
head(s) Return first element of s.tail(s) Return all but first element of s.empty(s) Return true iff s is an empty string.
2.7 Defining the Algorithm Iteratively
Algorithm Equal;input String s, String t;output Boolean r where r ≡ s = tr := true;if |s| = |t| then
for k from 0 to |s| − 1 doif s[k] 6= t[k] then
r := falseelse
r := false
2.8 Defining the Algorithm Recursively
Equal(s, t)△
=if empty(s) and empty(t) then trueelsif empty(s) or empty(t) then falseelsif head(s) 6= head(t) then falseelse Equal(tail(s), tail(t))
Chapter 3
Two Simple ProgrammingLanguages
3.1 Imperative and Applicative Languages
We will define two simple programming languages: one imperative, the otherapplicative.
The language of while-programs
• imperative: has assignments and explicit flow of control
• similar in style to Pascal, Ada, C++, Java
• corresponds to the iterative algorithm style
The language of applicative programs
• applicative: concerned with applying functions to arguments andevaluating expressions
• similar in style to Lisp, Scheme, and to functional languages such asHaskell (COMP304).
• corresponds to the recursive algorithm style
3.2 The Language of While-Programs
assignment statements x := ex is a variable, e is an expressionWe (usually) don’t worry about declarations
no-operation statement skipDoes nothing at all: just like ; by itself in C++
11
12 CHAPTER 3. TWO SIMPLE PROGRAMMING LANGUAGES
sequence S1;S2
Note that ; is a separator (like Pascal, unlike C++)
selection if cond then S1 else S2
Usually no need for {}Will omit “else skip”Can chain conditions:
if C1 then S1 elsif C2 then S2 · · · else Sn
iteration while cond do S
procedures: (a bit like static methods in Java)
procedure name(parameters);begin
Send
The heading names the program, and lists its inputs and outputs.Can declare local variables (with types) after the heading: usually wewon’t bother.The program can be invoked from other programs simply by naming it: ifwe have defined
procedure A(in x; out y); begin y:=x endwe can then call
A(2, z)with the same effect as z := 2.
3.3 Comparing strings againprocedure Equal(in s, t; out r);begin
if |s| = |t| thenk := 0; r := true;while k < |s| do
if s[k] 6= t[k] thenr := false;
k := k + 1else
r := falseend
3.4 The Applicative Language
Purely applicative (or functional) languages have no assignable variables.
3.5. THOSE STRINGS AGAIN ... 13
They are more like mathematical functions.Programming in this style may seem unfamiliar, but it is much easier to get
right than imperative programming.The basic constructs are function definition and function call. For ex-
ample:
add(x, y)△
= x+ y
double(x)△
= add(x, x)
Instead of a selection statement, we have a conditional expression:if cond then E1 else E2
is an expression whose value is E1 or E2 according to the value of cond.For example:
abs(x)△
=if x ≥ 0 then
xelse
−x
Instead of a looping construct, we use recursion.The definition of a function f can include calls on f itself (mathematically a
no-no, but familiar from recursive procedures/methods in imperative program-ming).
For example:
mul(x, y)△
=if x < 0 then
−mul(abs(x), y)elsif x = 0 then
0else
add(y,mul(x− 1, y))
3.5 Those strings again ...
Equal(s, t)△
=if |s| = |t| then Equal′(s, t, 0) else false
Equal′(s, t, k)△
=if k ≥ |s| then trueelsif s[k] 6= t[k] then false
14 CHAPTER 3. TWO SIMPLE PROGRAMMING LANGUAGES
else Equal′(s, t, k + 1)
3.6 Using head and tail
Equal(s, t)△
=if isempty(s) and isempty(t) then trueelsif isempty(s) or isempty(t) then falseelsif head(s) 6= head(t) then falseelse Equal(tail(s), tail(t))
Chapter 4
Reasoning about Programs
4.1 Some Questions about Languages
1. Is program P a valid program in language L?
2. How many valid sentences are there in language L?
3. What sentences are in the intersection of languages L1 and L2?
4. How hard is the decision question for language L?
5. What does program P mean?
6. Does program P terminate when given input x?
7. What output does it produce?
8. Do programs P and Q do the same thing?
9. Is there a program in language L1 that does the same thing as programP in language L2?
10. Are languages L1 and L2 equally expressive?
4.2 Syntax
The first four questions are about syntax.They concern the form of sentences in a language, not the meanings.In Parts I and II (weeks 3–10) of the course, we will look at issues of syntax:
• What is a language?
• How can we define the set of all valid sentences?
• How can we (mechanically) decide whether a sentence is valid in a lan-guage?
15
16 CHAPTER 4. REASONING ABOUT PROGRAMS
• What relationships are there between different models of languages?
4.3 Semantics
Semantics tells us what a (syntactically valid) program means: what will happenif it is run with any given input.
Defining semantics is harder than defining syntax: we need a suitable (math-ematical) model of program execution.
Such a model may be operational, denotational, or axiomatic:
• an operational model defines some machine and a procedure for translating
programs into instructions for that machine (Part III of the course)
• a denotational model defines mathematical functions that directly capturethe behaviour of the program (COMP 304)
• an axiomatic model provides rules for reasoning about the specificationsthat a program satisfies.
We will develop a simple axiomatic model for while-programs.
4.4 Comparing Strings One More Timeprocedure Equal(in s, t; out r);begin
0 if |s| = |t| then
1 k := 0;2 r := true ;3 while k < |s| do
4 if s[k] 6= t[k] then
5 r := false;7 k := k + 18 else
9 r := false
10 end
How can we convince ourselves that it is correct?
• Testing.
• Mathematical reasoning.
We can reason about different input cases (case analysis) and the corre-sponding execution sequences:
• If |s| 6= |t|, the test at line 0 fails, and the program sets r to false (line 9),which is the correct output in this case.
• If |s| = |t|, the test succeeds, and the program sets r to true (line 2), thengoes into the loop (lines 3–7).
4.4. COMPARING STRINGS ONE MORE TIME 17
Now this kind of reasoning breaks down, because we don’t know how manytimes the program will go round the loop. We have to reason in a way that isindependent of the number of iterations performed.
This means we need induction.We need to identify some property that:
• holds every time execution reaches the top of the loop (line 3)
• guarantees the required result when the loop exits (after line 7).
When execution reaches the top of the loop, we know:
• |s| = |t|
• 0 ≤ k ≤ |s|
• r = true iff s[0 .. k − 1] = t[0 .. k − 1]
i.e. r ≡ (∀i ∈ 0 .. k − 1)s[i] = t[i]
This is called a loop invariant.We need to show that:
1. The loop invariant holds on entry to the loop.
2. If the loop invariant holds at the top of the loop, and the loop body isexecuted, it will hold again at the top of the loop.
3. If the loop invariant holds at the top of the loop, and the loop exits, therequired property (the postcondition) holds afterwards.
Together, these constitute a proof by induction that the loop does what isrequired.
1. Invariant holds on entry:
• |s| = |t| by the if condition.
• 0 ≤ k ≤ |s|, since k = 0 (line 1) and 0 ≤ |s| by definition.
• k = 0 means that (∀i ∈ 0 .. k − 1)s[i] = t[i] is trivially true; andr = true.
2. Invariant is maintained by the loop:
• |s| = |t| is unchanged, because s and t don’t change.
• 0 ≤ k < |s| is true initially (induction hypothesis), and k < |s| (loopcondition), so 0 ≤ k ≤ |s| − 1.
• r ≡ s[0 .. k − 1] = t[0 .. k − 1] initially (induction hypothesis).
18 CHAPTER 4. REASONING ABOUT PROGRAMS
Now, if s[k] = t[k] (line 4), r remains unchanged, and s[0..k] = t[0..k] iffs[0..k − 1] = t[0..k − 1].
On the other hand, if s[k] 6= t[k], r becomes false (line 5), and so doess[0..k] = t[0..k].
In either case, 0 ≤ k ≤ |s| − 1 ∧ r ≡ s[0..k] = t[0..k].Now, the next iteration of the loop will increment k, so whatever is trueof k now will be true of k − 1 at the start of the next iteration.
So the loop invariant holds again, as required.
3. Postcondition holds at loop exit:When |s| = |t|,
0 ≤ k ≤ |s|,r ≡ (s[0 .. k − 1] = t[0 .. k − 1]), andk 6< |s|
all hold,k = |s|, so r ≡ s[0 .. |s| = 1] = t[0 .. |s| − 1].
But s[0 .. |s| = 1] = s and t[0 .. |s| − 1] = t (remember |s| = |t|), hencer ≡ s = t.
Thus, the program will set r to true if s = t, and to false if s 6= t.
4.5 Laws for Program Verification
In verifying the above program, we used several kinds of knowledge and reason-ing:
• Standard laws of mathematics/logic.
• Properties of operations used in the specification and program.
E.g. indexing, string length, string equality.
• Laws for reasoning about program execution: assignments and controlstructures.
Let us know make our use of the laws about execution a little more precise.
4.6 Reasoning with Assertions
Our reasoning was based on intuition about what must be true at particularpoints in the execution.
For example, before line 3 we knew that |s| = |t|, because we were inside thethen part of the if statement at line 0.
We also knew that k = 0 and r = true because of lines 1 and 2.We can formalize that intuition by annotating the lines of the program with
assertions : logical statements that are known to be true immediately before theline is executed.
Verification then proceeds by showing that assertions are indeed satisfied.
4.6. REASONING WITH ASSERTIONS 19
procedure Equal(in s, t; out r);begin
{I0} if |s| = |t| then
{I1} k := 0;{I2} r := true ;{I3} while k < |s| do
{I4} if s[k] 6= t[k] then
{I5} r := false
{I6} else skip;{I7} k := k + 1{I8} else
{I9} r := false
{I10} end
I0 is just the precondition: in this case,
I0△
= true.
I1 is true when we are inside the then part of the if.A law for if tells us I1 = I0 ∧ |s| = |t|; that is,
I1△
= |s| = |t|.
Law for if {P}if C then {P ∧ C}S1 else {P ∧ ¬C}S2
I3 is the loop invariant. There is no easy way to find a loop invariant: theonly way is to understand how the loop works. In this case, we know that0 ≤ k ≤ |s|, and that the purpose of the loop is to decide whether s = t “up tobut not including” element k. We also know that |s| = |t|, from I1. Hence:
I3△
= 0 ≤ k ≤ |s| ∧ (r ≡ s[0 .. k − 1] = t[0 .. k − 1]) ∧ |s| = |t|.
We must show that the loop invariant holds initially; that is, {I1}k :=0; {I2}r := true{I3} is correctly annotated.
This requires reasoning about a sequence (;)of assignment statements (:=).
Law for ; {P}S1{Q} {Q}S2{R}{P}S1;S2{R}
Our S1 is k := 0 and our S2 is r := true.Our P , Q, and R are respectively I1, I2, and I3.We must find I2 such that {I1}k := 0{I2} and {I2}r := true{I3}.
Law for := {Q[e/x]}x := e{Q}
For {I2}r := true{I3}, we need I2 to be I3[true/r]: that is, I2△
= 0 ≤ k ≤|s| ∧ (true ≡ s[0 .. k − 1] = t[0 .. k − 1]) ∧ |s| = |t|.
20 CHAPTER 4. REASONING ABOUT PROGRAMS
Now, {I1}k := 0{I2} follows as long as I1 is I2[0/k]:indeed, 0 ≤ 0 ≤ |s| ∧ (true ≡ s[0 .. 0 − 1] = t[0 .. 0 − 1]) ∧ |s| = |t| is the same asI1 by simplification.
Immediately inside the loop, the loop invariant (I3) still holds, and k ≤|s| − 1, by the loop condition. (We are using a Law for while here, but let’sskip the formality.)
Hence:I4
△
= (r ≡ s[0 .. k − 1] = t[0 .. k − 1]) ∧ |s| = |t| ∧ k ≤ |s| − 1.
The if law tells us that s[k] 6= t[k] at I5, so:
I5△
= I4 ∧ s[k] 6= t[k].
The assignment law justifies adding r = false (check it!):
I6△
= I5 ∧ r = false.Before line 7, either the then part was taken, in which case I6 holds; or, it
was not taken, in which case, I4 still holds (skip law). So:
I7△
= (I6) ∨ (s[k] = t[k] ∧ I4).After line 7, we require I3 to hold again, to maintain the loop invariant.
Hence, we must prove: {I7}k := k + 1{I3}.The assignment law will justify this (check it).
After the loop exits, we know that the loop invariant I3 still holds, and thatk = |s|. Thus:
I8△
= 0 ≤ |s| ≤ |s| ∧ r ≡ s[0..|s| − 1] = t[0..|s| − 1] ∧ |s| = |t|.In the else branch of the main if, we get:
I9△
= |s| 6= |t|.In that case, we assign r := false.
Finally, at end the two branches of the outer if come together:
I10△
= (I8) ∨ (I9 ∧ r = false)which implies r ≡ s = t (exercise).
Part I
Regular Languages
21
Chapter 5
Preliminaries
5.1 Part I: Regular Languages
In this part of COMP 202 we will begin to look at formal languages, and at howthey can be defined.
In this lecture we will introduce some of the basic notions required. Insubsequent lectures we will look at:
• defining languages using regular expressions
• defining languages using finite automata
We will also explore the relationship between these ways of defining lan-guages.
5.2 Formal languages
What we say ‘formal’ we mean that we are concerned simply with the form (orthe syntax ) of the language.
We are not concerned with the meaning (or semantics) of the symbols.The theory of syntax is very much more straightforward and better under-
stood than the theory of semantics.We are not concerned with trying to understand ‘natural’ languages like
English or Maori, nor even with the question of whether the tools we develop todeal with formal languages are appropriate for dealing with natural languages.
5.3 Alphabet
Definition 1 (Alphabet) An alphabet is a finite set of symbols.
In the textbook, Cohen separates the elements of a set with space ratherthan a comma.
The symbols themselves are meaningless, so we may as well pick a convenientalphabet: Cohen uses {a b} in his examples.
Conventionally we use Σ and (occasionally) Γ as names for alphabets.
5.4 String
Definition 2 (String) A string over an alphabet is a finite sequence of symbols
from that alphabet.
Example: Some stringsExamples: if we have an alphabet Σ = {✢, ❦, ➸, ❐, ❇ } then
1. ✢
2. ❦➸
3. ❦➸❦❇➸❐➸
4.
5. ❦✢➸✢❦✢❇➸❐➸
6. ❐❦✢
are strings over Σ.String number 4 is the empty string. We need a better notation for the
empty string than simply not writing anything, so we use a meta-symbol for it.Different authors use different meta-symbols, the most common ones being
ǫ, λ, and Λ. Cohen chooses Λ, so we might as well, too.Note that Λ is a string over any alphabet, but Λ is not a symbol in the
alphabet.
5.5 Language, Word
Definition 3 (Language) A language is a set of strings.
Definition 4 (Word) A word is a string in a language.
5.6. OPERATIONS ON STRINGS AND LANGUAGES 25
Much of the rest of this course is devoted to investigating the related prob-lems of:
• how we can define languages, and
• how we can show whether or not a given string is a word in a language.
5.6 Operations on Strings and Languages
Definition 5 (Concatenation) If s and t are strings then the concatenation
of s and t, written st, is a string.
st consists of the symbols of s followed immediately by the symbols of t.We freely allow ourselves to concatenate single symbols onto strings.We can concatenate two languages. If L1 and L2 are languages then:
L1L2 = {st|s ∈ L1, t ∈ L2}
Definition 6 (Length of a string) The length of a string is the number of
occurrences of symbols in it.
If s is a string we write |s| for the length of s.We can give a recursive algorithm for | |:
• |Λ| = 0
• if x is a single symbol and y a string, then |xy| = 1 + |y|
Definition 7 (Kleene closure (Kleene *)) If S is a set then S∗ is the set
consisting of all the sequences of elements of S
Note: Λ is in S∗.We can define S∗ inductively:
• Λ ∈ S∗
• if x ∈ S and y ∈ S∗ then xy ∈ S∗
If Σ is an alphabet then Σ∗ is a language over Σ.If L is a language then L∗ is a language.
Example: Kleene closure
• if Σ = {1, 0} then Σ∗ = {Λ, 1, 0, 11, 10, 01, 00, . . .}
• if L = {❦❦, ❦➸} then L∗= {Λ, ❦❦, ❦➸, ❦❦❦❦, ❦❦❦➸, ❦➸❦❦,❦➸❦➸, . . . }
Question: what is (Σ∗)∗?
26 CHAPTER 5. PRELIMINARIES
Definition 8 (Kleene +) If S is a set then S+ is the set consisting of all the
non-empty sequences of elements of S
Note: Λ is not in S+, unless it was in S.We can define S+ inductively:
• if x ∈ S then x ∈ S+
• if x ∈ S and y ∈ S+ then xy ∈ S+
We can also define S+ as SS∗
5.7 A proof
Throughout this course we will perform proofs by induction, so we begin herewith a simple one.
When we say that a string is a sequence of symbols over an alphabet Σ wemean that a string is either:
• the empty string Λ, or
• a symbol from Σ followed by a string over Σ
So, if we want to prove properties of strings over some alphabet Σ we canproceed as follows:
• prove the property holds of the empty string Λ, and
• prove the property holds of a string xy where x is an arbitrary element ofΣ and y an arbitrary string over Σ assuming the property holds of y.
More formally, in order to show P (s) for an arbitrary string s over an alpha-bet Σ, show:
Base case P (Λ)
Induction step P (xy), given P (y), where x is an arbitrary element of Σ, y anarbitrary string over Σ
This sort of pattern of inductive proof occurs in very many situations in themathematics required for computer science.
5.8 Statement of the conjecture
The conjecture that we will prove by induction is:
Conjecture 1 |st| = |s| + |t|
In English: “the length of s concatenated with t is the sum of the lengths ofs and t”.
We proceed by induction on s, where P (s) is |st| = |s| + |t|.
5.9. BASE CASE 27
5.9 Base case
P (Λ) is |Λt| = |Λ| + |t|.We show that the LHS and the RHS of this equality are just the same.On the LHS we use the fact that Λt = t, to give us |t|On the RHS we observe that |Λ| = 0 and 0 + n = n to give us |t|.Thus LHS = RHS, and the base case is established.
5.10 Induction step
P (xy) is |(xy)t| = |xy| + |t|.We must show |(xy)t| = |xy| + |t|, given |yt| = |y| + |t|.On the LHS we use the associativity of concatenation to observe that (xy)t =
x(yt).Next we use the definition of | | to see that the LHS is 1 + |yt|.The RHS is |xy| + |t| We use the definition of | | to see that the RHS is
1 + |y| + |t|.Now we use the induction hypothesis |yt| = |y| + |t| to show that the RHS
is 1 + |yt|.So we have shown that LHS = RHS in the induction step.
5.11 A theorem
We have established both the base case and the induction step, so we havecompleted the proof.
Now our conjecture can be upgraded to a theorem:
Theorem 1 |st| = |s| + |t|
28 CHAPTER 5. PRELIMINARIES
Chapter 6
Defining languages usingregular expressions
6.1 Regular expressions
In this lecture we will see how we can define a language using regular expressions.The first step is to give an inductive definition of what a regular expression
is.Before we define regular expressions we will give a similar definition for
expressions of arithmetic, to illustrate the method used.
6.2 Arithmetic expressions
We will now give a formal inductive description of the expressions we have inarithmetic.
• if n ∈ N then n is an arithmetic expression;
• if e1 and e2 are expressions then so are
– −(e1)
– (e1 + e2),
– (e1 − e2),
– (e1 ∗ e2),
– (e1/e2),
– (ee2
1 ) (we could have chosen to use a symbol, such as ^, rather thanjust use layout)
This definition forces us to include lots of brackets:
• ((1+2)+3)
29
30CHAPTER 6. DEFINING LANGUAGES USING REGULAR EXPRESSIONS
• (1+(2+3))
are expressions, but
• 1+(2+3)
• 1+2+3
are not.
6.3 Simplifying conventions
We do not need to use all these brackets if we adopt some simplifying conven-tions.
First we allow ourselves to drop the outermost brackets, so we can write:
• (1+2)+3
rather than:
• ((1+2)+3)
We also adopt some conventions about the precedence and associativity ofthe operators.
6.4 Operator precedence
• + and − are of lowest precedence
• ∗ and / are next
• ^ is highest
So we can write:
• 2 + 3 ∗ 45
rather than:
• 2 + (3 ∗ (45))
6.5 Operator associativity
We also adopt a convention that all the operators are left associative. Thismeans that if we have a sequence of operators of the same precedence we fill inthe brackets to the left.
So we can write:
• 1 + 2 + 3
rather than:
• (1 + 2) + 3
We can have right associative operators, or non-associative operators.
6.6. CARE! 31
6.6 Care!
Be careful not to confuse the two statements:
1. # is a left associative operator
2. # is an associative operation
1 means that we can write x#y#z instead of (x#y)#z.
2 means that (∀xyz).(x#y)#z = x#(y#z).
In the examples above + and / are both left associative. However, additionis associative, but division is not:
60/6/2 = 5
60/(6/2) = 20
6.7 Defining regular expressions
If Σ is an alphabet then:
• the empty language ∅ and the empty string Λ are regular expressions
• if r ∈ Σ then r is a regular expression
• if r is a regular expression then so is (r∗)
• if r and s are regular expressions then so are
– (r + s)
– (rs)
6.8 Simplifying conventions
We allow ourselves to dispense with outermost brackets.∗ has the highest precedence and + the lowest.
We take all operators to be left-associative, so we write:
• r + s + t instead of (r + s) + t
• rst instead of (rs)t
32CHAPTER 6. DEFINING LANGUAGES USING REGULAR EXPRESSIONS
6.9 What the regular expressions describe
So far we have said what the form of a regular expression is, without explainingwhat it describes. This is as if we had explained the form of the arithmeticexpressions without explaining what the operations of addition, subtraction andso on were.
The regular expressions describe sets of strings over Σ, that is languages, sowe must explain what languages the regular expressions describe.
We write Language(r) for the language described by regular expression r.When we are being sloppy we use r for both the regular expression and the
language it describes.
Language(∅)The language described by ∅ is the empty language {}.
Language(Λ)The language described by Λ is {Λ}.
Note {} 6= {Λ}
Language(r), r ∈ ΣThe language described by r, r ∈ Σ is {r}
Example:If Σ = {0, 1} then:
Language(0) = {0}
Language(1) = {1}
Language(r∗)The language described by r∗ is the Kleene closure of the language de-scribed by r.
Example:If Σ = {0, 1} then:
Language(0∗) = {Λ, 0, 00, 000, . . .}
Language(1∗) = {Λ, 1, 11, 111, . . .}
Language(r + s)The language described by r + s is the union of the languages describedby r and s.
34CHAPTER 6. DEFINING LANGUAGES USING REGULAR EXPRESSIONS
Chapter 7
Regular languages
7.1 Regular Languages
In the last lecture we introduced the regular expressions and the operations thatthe regular expressions correspond to.
The regular expressions allow us to describe languages. A language whichcan be described by a regular expression is called a regular language.
If Σ is an alphabet, r ∈ Σ is a symbol, and r is a regular expression, then:
• Language(∅) is ∅
• Language(Λ) is {Λ}
• Language(r), r ∈ Σ is {r}
• Language(r∗) is (Language(r))∗
• Language(r + s) is Language(r) ∪ Language(s)
• Language(rs) is Language(r)Language(s)
7.2 Example languages
In this lecture we will look at the languages which we can describe with regularexpressions. We will usually restrict attention to Σ = {0, 1}Example: Language(0 + 1)
{00, 01, 10, 11} is Language(00 + 01 + 10 + 11)Notice that {00, 01, 10, 11} is also Language((1 + 0)(1 + 0)), but to show
that the language is regular we only need to provide one regular expressiondescribing it.
7.4 An algorithm
The proof that every finite language is regular provides us with an algorithm
which takes a finite language and returns a regular expression which describesthe language.
Many of the proofs that we see in this course have an algorithmic character.Such algorithmic proofs are often called constructive, and provide the basis fora program.
Conversely every program is a constructive proof (even if we are usually notinterested in what it is a proof of).
7.5 EVEN-EVEN
We now describe a language which Cohen introduces, which he calls EVEN-EVEN.
EVEN-EVEN = Language((00 + 11 + (01 + 10)(00 + 11)∗(01 + 10))∗)It may not be immediately obvious what language this is (or rather, if there
is a simple description of this language).However every word of EVEN-EVEN contains an even number of 0’s and an
even number of 1’s.
38 CHAPTER 7. REGULAR LANGUAGES
Furthermore EVEN-EVEN contains every string with an even number of 0’sand an even number of 1’s.
7.6 Deciding membership of EVEN-EVEN
Suppose we are given the task of writing a program to decide whether a givenstring is in EVEN-EVEN. How would we go about this task?
One method would be to use two counters, n0 and n1, and to go throughthe string counting all the 0’s and all 1’s. If n0 and n1 are both even at the endof the string then the string is in EVEN-EVEN.
If we call the pair 〈n0, n1〉 a state of the program then our program willneed to be able to go through as many states as there are symbols in a stringto decide membership.
Is there a program which uses fewer states?
7.7 Using fewer states
We don’t actually care now many 0’s and 1’s there are, so we could use twoboolean flags b0 and b1. As we go through the string we flip the appropriate flagas we read each symbol. If both the flags end up in the same state that theystarted in then the string is in EVEN-EVEN.
If we call the pair 〈b0, b1〉 a state of the program then our program will needto be able to go through at most four distinct states to decide membership.
7.8 Using even fewer states
Suppose, instead of reading the symbols one by one, we read them two by two.Now we need only use one boolean flag, and we do not have to flip it all thetime.
If both the symbols we read are the same then we leave the flag alone. Ifthey differ then we flip the flag. Two different symbols means we have just readand odd number of 0’s and 1’s. If we had read an even number before we havenow read an odd number, if we had read an odd number before we have nowread an even number.
If the flag ends up in the same state it began in then the string is in EVEN-EVEN.
The program need go through at most two distinct states to decide mem-bership.
7.9 Uses of regular expressions
Regular expressions turn out to have practical uses in a variety of places.
7.9. USES OF REGULAR EXPRESSIONS 39
We can informally think of a regular expression as describing the most typical
string in a language.The tokens of a programming language can usually be described by regular
expressions.For example, a type name might be an uppercase letter followed by a se-
quence of uppercase letters, lowercase letters or underscores, and so on.The task of identifying the tokens in a program is called lexical analysis,
and is the first step in compiling the program. The UNIX utility lex is alexical-analyser generator: given regular expressions describing the tokens ofa language it generates a program which will perform lexical analysis. Sinceregular expressions describe typical strings they are the basis for many searchingand matching utilities.
The UNIX utility grep allows the user to specify a regular expression tosearch for in a file.
Many text editors provide a facility similar to grep for searching for strings.Programming languages which are oriented towards text processing usually
allow us to describe strings using regular expressions.
40 CHAPTER 7. REGULAR LANGUAGES
Chapter 8
Finite automata
8.1 Finite Automata
We leave regular expressions and introduce finite automata.Finite automata provide us with another way to describe languages.Note Cohen uses the term ‘finite automaton’ (‘FA’) where many other au-
thors use the term ‘deterministic finite automaton’ (‘DFA’). After we have intro-duced deterministic finite automata we will introduce non-deterministic finiteautomata. In common with everybody else, Cohen calls these ‘NFAs’.
8.2 Formal definition
Definition 9 (Finite automaton) A finite automaton is a 5-tuple (Q,Σ, δ, q0, F )where:
• Q is a finite set: the states
• Σ is a finite set: the alphabet
• δ is a function from Q× Σ to Q: the transition function
• q0 ∈ Q: the start state
• F ⊆ Q: the final or accepting states.
Note: F may be Q, or F may be the empty set.
8.3 Explanation
While this is all very well it does not help us see what finite automata do, orhow.
Suppose we have a string made up of symbols from Σ. We begin in the startstate q0. We read the first symbol s from the string, and then enter the state
41
42 CHAPTER 8. FINITE AUTOMATA
given by δ(q0, s). We then read the next symbol from the string and use thetransition function to move to a new state, repeating this process until we reachthe end of the string.
If we have read all the string, and are in one of the accepting states we saythat the automaton accepts the string.
The language accepted by the automaton is the set of all the strings itaccepts.
So automata give us a way to describe languages.
8.4 An automaton
Suppose we have an automaton M1 = (Q,Σ, δ, q0, F ), where:
• Q = {S1, S2, S3}
• Σ = {0, 1}
• δ(S1, 1) = S2
• δ(S2, 1) = S3
• q0 = S1
• F = {S3}
Lets see if M1 accepts 1.We begin in state S1 and we read 1. δ(S1, 1) = S2 so we enter S2.Our string is empty, but S2 is not a final state, so M1 does not accept 1.Lets see if M1 accepts 11. We begin in state S1 and we read 1. δ(S1, 1) = S2
so we enter S2.Now we read 1. δ(S2, 1) = S3 so we enter S3.Our string is empty, and S3 is an accepting state so M1 accepts 11.Lets see if M1 accepts 0. We begin in state S1 and we read 0. δ is a partial
function, and there is no value given for δ(S1, 0). We cannot make any progresshere, and so 0 is not in the language accepted by M1. We can think of themachine as ‘crashing’ on this string.
Note: here we differ from Cohen. He insists (initially, at least) that δ betotal, and adds a ‘black hole’ state to all his machines, whereas we allow δ tobe partial.
Cohen would have an new state S4, and would extend δ with:
• δ(S1, 0) = S4
• δ(S2, 0) = S4
• δ(S4, 0) = S4
• δ(S4, 1) = S4
Since S4 is not an accepting state, and since once we enter it there is no wayto leave, this extension does not change the language accepted by the machine.
8.5. THE LANGUAGE ACCEPTED BY M1 43
8.5 The language accepted by M1
A moment’s reflection will show that the only string which M1 accepts is 11,and so the language M1 accepts is Language(11).
8.6 Another automaton
Suppose we have an automaton M2 = (Q,Σ, δ, q0, F ), where:
• Q = {S1, S2, S3}
• Σ = {0, 1}
• δ(S1, 1) = S2
• δ(S2, 1) = S3
• δ(S3, 0) = S3
• δ(S3, 1) = S3
• q0 = S1
• F = {S3}
M2 is nearly the same as M1, but now we have transitions from S3 to itself onreading either a 0 or a 1.
What language does M2 accept?Clearly the only way to get from S1 to S3 is to begin with two 1’s. If the
string is just 11 it will be accepted.What about 111? This will be accepted too, as δ(S3, 1) = S3.What about 110? This will be accepted too, as δ(S3, 0) = S3.In fact any extension of 11 will be accepted, so we see that M2 accepts
Language(11(0 + 1)∗).
8.7 A pictorial representation of FA
It is often easier to see what an FA does if we draw a picture of it.We can draw a finite automaton out as a labelled, directed graph. Each state
of the machine is a node, and the transition function tells us which nodes areconnected by which edges. We mark the start state, and any accepting statesin some special way.
M1 can be represented as:
GFED@ABC1−1
//?>=<89:;21
// GFED@ABC3+
M2 can be represented as:
GFED@ABC1−1
//?>=<89:;21
// GFED@ABC3+ 1,0
qq
44 CHAPTER 8. FINITE AUTOMATA
Note: Sometimes the start state is pointed to by an arrow, and the acceptingstates are drawn as double circles or squares, e.g.:
//?>=<89:;11
//?>=<89:;21
//?>=<89:;/.-,()*+3 1,0vv
or//?>=<89:;1
1//?>=<89:;2
1// 3 1,0
vv
8.8 Examples of constructing an automaton
If L is a language over some alphabet Σ, then L, the complement of L, is{s|s ∈ Σ∗&s 6∈ L}.
If we have some automaton ML, which accepts L, then we can construct anautomaton ML which accepts L.
ML will accept just the strings which ML does not, and will not accept justthe strings which ML does.
To recognise the complement of a language, we can think of ourselves asgoing through the same steps as we would take to recognise the language, butmaking the opposite decision at each state.
We then expect ML and ML, to have the same states, but an accepting stateof ML will not be an accepting state of ML and vice versa.
8.9 Constructing ML
M2 from above accepts Language(11(0 + 1)∗).Our first suggestion for a machine, M3 to accept Language(11(0 + 1)∗) is
to have:
• the same set of states,
• the same transition function,
• the complement of the set of accepting states of M2.
M3 = (Q,Σ, δ, q0, F ), where:
• Q = {S1, S2, S3}
• Σ = {0, 1}
• δ(S1, 1) = S2
• δ(S2, 1) = S3
• δ(S3, 0) = S3
• δ(S3, 1) = S3
8.10. A MACHINE TO ACCEPT LANGUAGE (11(0 + 1)∗) 45
• q0 = S1
• F = {S1, S2}
F (M3) is Q− F (M2), as you would expect. Graphically:
ONMLHIJK1−,+1
// GFED@ABC2+1
//?>=<89:;3 1,0vv
This automaton correctly accepts Λ and the string 1.What about the string 0?
The machine crashes, so, close, but no cigar.
The problem is that M2 and M3 both crash on the same strings: M3 shouldaccept the strings that M2 crashes on, and vice versa.
What we have to do is to convert our initial machine into a machine whichaccepts the same language and whose transition function is total. We can alwaysdo this. (How?)
Then we construct a third machine whose set of accepting states is thecomplement of that of the second machine.
8.10 A machine to accept Language(11(0 + 1)∗)
M4 = (Q,Σ, δ, q0, F ), where:
• Q = {S1, S2, S3, S4}
• Σ = {0, 1}
• q0 = S1
• F = {S1, S2, S4}
• δ(S1, 0) = S4
• δ(S1, 1) = S2
• δ(S2, 0) = S4
• δ(S2, 1) = S3
• δ(S3, 0) = S3
• δ(S3, 1) = S3
• δ(S4, 0) = S4
• δ(S4, 1) = S4
Graphically:
ONMLHIJK1−,+
GF ED0
//
1// GFED@ABC2+
1//
@A BC0
//
?>=<89:;3 1,0vv GFED@ABC4+ 1,0
qq
46 CHAPTER 8. FINITE AUTOMATA
8.11 Summary
We have outlined a method which allows us to construct a machine which acceptsL from a machine which accepts L.
We can think of this construction a proof of the theorem:
Theorem 3 If a language L can be defined using an FA, then so can the lan-
guage L
Chapter 9
Non-deterministic Finiteautomata
9.1 Non-deterministic Finite Automata
We now introduce a variant on the Finite Automaton, the Non-deterministic
finite automaton (NFA).
The difference between an NFA and an FA is that, in an NFA more than
one arc leading from a state may be labelled by the same symbol.
Formally, the transition function of an NFA takes a state and a symbol andreturns a set of states.
A string can now label more than one path through the automaton.
The string itself does not determine which state we will end up when weread it: there is some non-determinism built into the machine.
9.2 Preliminaries
PowersetIf A is a set then 2A is the powerset of A,i.e. the set of all subsets of A.
{} ∈ 2A
A ∈ 2A
Partial and total functionsA partial function is not defined for some values of its domain. If f is apartial function and x is in the domain of f we write f(x) ↓ if f(x) is
defined.
47
48 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
9.3 Formal definition
Definition 10 (Non-deterministic finite automaton) A non-deterministic
finite automaton is a 5-tuple (Q,Σ, δ, q0, F ) where:
• Q is a finite set: the states
• Σ is a finite set: the alphabet
• δ is a function from Q× Σ to 2Q: the transition function
• q0 ∈ Q: the start state
• F ⊆ Q: the final or accepting states.
Note: δ is always total.
9.4 Just like before. . .
An NFA accepts a string if there is a path from the start state to an acceptingstate labelled by the string.
The set of strings accepted by a NFA is the language it accepts.
9.5 Example NFA
M5 = (Q,Σ, δ, q0, F ), where
• Q = {S1, S2, S3, S4}
• Σ = {0, 1}
• q0 = S1
• F = {S2, S3}
• δ(S1, 0) = {S2, S3}
• δ(S1, 1) = {}
• δ(S2, 0) = {S4}
• δ(S2, 1) = {S4}
• δ(S3, 0) = {}
• δ(S3, 1) = {S3}
• δ(S4, 0) = {S2}
• δ(S4, 1) = {S2}For clarity and conciseness, we sometimes choose to show δ as a table:δ 0 1S1 {S2, S3} {}S2 {S4} {S4}S3 {} {S3}S4 {S2} {S2}
And, of course we can give a pictorial representation. M5 can be drawn:
9.6. M5 ACCEPTS 010 49
GFED@ABC3+111
GFED@ABC1−0
//0oo GFED@ABC2+0,1
//?>=<89:;4EDGF
0,1
oo
9.6 M5 accepts 010
Now let’s see whether some strings are in the language defined by M5. We willtry to find a path from the start state to an accepting state.
We begin with 010.
We start in S1, and the first symbol in the string is 0. Now we have a choiceas there are two transitions out of S1 labelled by 0.
We choose to go to S2. The 1 takes us to S4, and the final 0 brings us backto S2. We have exhausted our string, and we are in an accepting state, so M5
accepts 010.
9.7 M5 accepts 01
Next we try 01
We start in S1, and the first symbol in the string is 0. Now we have a choiceas there are two transitions out of S1 labelled by 0.
As before we choose to go to S2. The next transition takes us to S4. Ourstring is exhausted, but we are not in an accepting state.
If this were a deterministic automaton then we would know that the stringwas not in the language. However this is a non-deterministic machine, and theremay be another path from the start state to an accepting state labelled by 01.
We backtrack to the place where we made a choice, and pick the other arclabelled by 0. We see that this string is accepted by M5.
9.8 Comments
A little thought shows that M5 accepts Language(01∗ + 0((1 + 0)(1 + 0))∗).
The following FA, M6 = (Q,Σ, δ, q0, F ), accepts the same language:
• Q = {S1, S2, S3, S4, S5, S6}
• Σ = {0, 1}
• q0 = S1
• F = {S3, S5, S6}
δ 0 1S1 S3 S2
S2
S3 S4 S5
S4 S6 S6
S5 S6 S3
S6 S4 S4
50 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
9.9 NFA’s are as powerful as FA’s
Now we move on to prove two theorems about the languages definable by FA’sand NFA’s.
Theorem 4 Every language definable by an FA is definable by a NFA.
The proof of this consists of an algorithm which takes an arbitrary FA, andconstructs an NFA which accepts the same language.
9.10 Proof
We have an FA, MFA = (Q,Σ, δ, q0, F ), and we will construct an NFA MNFA =(Q′,Σ′, δ′, q′0, F
′), which accepts the same language.Clearly Σ′ = Σ is the only sensible choice. It also seems reasonable to keep
the same structure for the machine and take:
• Q′ = Q
• q′0 = q0
• F ′ = F
The only real difference is in the two transition functions:
• one may be partial, the other is total;
• the range of one is states, the range of the other is sets of states.
The solution is simple:
δ′(S, σ) = if δ(S, σ) ↓ then {δ(S, σ)} else {}
The definition that we have given (Q′,Σ′, δ′, q′0, F′) is certainly an NFA. It
should also be clear that any string which labels a path from the start state toan accepting state in the FA will do so in the NFA, and only strings which labelsuch a path in the FA will do so in the NFA.
Hence the two languages are the same.Thus we have shown that the descriptive power of NFA’s is at least as strong
as that of FA’s.This should not be a surprise, as NFA’s were introduced as a generalisation
of FA’s.
9.11 Using the graphical representation
Given that the graphical representation of an FA is the graphical representationof an NFA which accepts the same language we could have just drawn the pictureand said “Look!”
That is too easy to be a real proof.
9.12. FA’S ARE AS POWERFUL AS NFA’S 51
9.12 FA’s are as powerful as NFA’s
Theorem 5 Every language definable by an NFA is definable by a FA.
The proof of this consists of an algorithm which takes an arbitrary NFA,and constructs a FA which accepts the same language.
This is a more remarkable result as NFA’s were introduced as a generalisationof FA’s.
Alas we cannot look at a picture and go “Ha!” We are forced to make acareful construction.
The key idea is that the states of the FA we construct will be sets of states
of the original NFA.When we tried to see if M5 would accept strings we used a strategy rather
like depth-first search. We pursued a path until we were either successful orstymied, and then retraced our steps to make alternate choices.
Suppose instead we had kept a record of all the possible states we could havereached as we worked our way along the string. This is rather like breadth-firstsearch.
For the string 010 and M5 we could have reached either of the states {S2, S3}after we had read 0, any state reachable from S2 or S3 on 1 next and so on.
9.13 Useful observations
• If T is a set of states of some machine then T ∈ 2Q.
• If Q if finite then so is 2Q
• Given some T ∈ 2Q then the set of all states reachable on some symbolσ ∈ Σ is
⋃{δ(t, σ)|t ∈ T }.
• For each T ∈ 2Q, σ ∈ Σ there is just one such⋃{δ(t, σ)|t ∈ T }
• Because there is just one set of states reachable on a given symbol from agiven set of states we have got determinism back.
We still have to sort out what the start states and accepting states are.The singleton set of the start state of the NFA is the obvious candidate for
the start state for our FA.A string is accepted by the NFA if there is any path from the start state to
an accepting state labelled by that string.Therefore we will take every set which contains an accepting state of the
NFA to be an accepting state of the FA we are constructing.Where are we? We now know:
• what the alphabet will be
• what the states of our FA look like,
52 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
• what the start state of our FA will be
• what the accepting states will be
• what the transition function will look like.
We still have to give a full description of the states and the transition func-tion.
9.14 Construction
After all that we now give a method to construct an FA
MFA = (Q′,Σ′, δ′, q′0, F′)
which accepts the same language as an NFA
MNFA = (Q,Σ, δ, q0, F )
As expected: Σ′ = Σ, and q′0 = {q0}.We will have Q′ ⊆ 2Q. We don’t take Q′ = 2Q, as we need only concern
ourselves with the states we can actually reach.
Hence we construct δ′ and Q′ in tandem.
9.15 Constructing δ′ and Q′
Start with q′0 = {q0}.δ′(q′0, σ) = δ(q0, σ) where σ ∈ Σ
This will generate new states of the FA, and we continue this process, con-structing δ′ using δ until no new states are created.
Any state of the FA which contains an accepting state of the NFA is anaccepting state of the FA.
Finally we may tidy things up by giving nice names to the states of the FA!
We have shown that, although 3 look like an extension of 2 they can acceptexactly the same languages.
Now we will introduce an extension of NFA’s, NFA with Λ transitions.
9.18 Motivation
We will show that NFA’s with Λ transitions can be used to accept exactly the
same languages as NFA’s.Since we have already shown that that NFA’s and FA’s can be used to accept
exactly the same languages it follows that NFA’s with Λ transitions and FA’scan be used to accept exactly the same languages.
We introduce NFA’s with Λ transitions because it is easy to construct an NFAwith Λ transitions which accepts the language described by a regular expression.
An FA is an easily implementable recogniser for a language, so we have away to go from a description of typical strings in the language to a recogniser.
We can extend the notion of an NFA to include automata with Λ transitions,that is with transitions that may occur on reading no symbols from the string.
The main difference is in the transition function: δ is now a function fromQ× {Σ∪{Λ}} to 2Q.
54 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
9.19 Formal definition
Definition 11 (Non-deterministic finite automaton with Λ transitions)A non-deterministic finite automaton with Λ transitions is a 5-tuple (Q,Σ, δ, q0, F )where:
• Q is a finite set: the states
• Σ is a finite set: the alphabet
• δ is a function from Q× {Σ ∪ {Λ}} to 2Q: the transition function
Theorem 6 Every language definable by an NFA is definable by a NFA with Λtransitions.
The proof is easy, as the machines are almost identical.If M8 = (Q,Σ, δ, q0, F ) is an NFA, then M9 = (Q′,Σ′, δ′, q′0, F
′) is an NFAwith Λ transitions which accepts the same language, if:
Q′ = Q Σ′ = Σ q′0 = q0 F ′ = F
And for S ∈ Q, σ ∈ Σ:
δ′(S, σ) = δ(S, σ)
δ′(S,Λ) = {}
Once again we could have just drawn the diagram representing the NFA,and announced that it also represented an NFA with Λ transitions.
9.23 Harder theorem
Theorem 7 Every language definable by an NFA with Λ transitions is definable
by a NFA.
As you might expect most of the hard work lies in constructing the transitionfunction.
If M10 = (Q,Σ, δ, q0, F ) is an NFA with Λ then we will construct M11 =(Q′,Σ′, δ′, q′0, F
′), an NFA which accepts the same language.We begin with:
Q′ = Q
Σ′ = Σ
q′0 = q0
9.24 δ′ the new transition function
δ′(S, σ) must now give us the set of states which are reachable from S on a σtransition and also on any combination of Λ transitions and a σ transition.
Some care is needed, as we may be able to make several Λ transitions before
we make the σ transition, and we may be able to make make several Λ transitionsafter we make the σ transition.
We need to find every state which is reachable from S by:
(Λ transition)∗(σ transition)(Λ transition)∗
We can call this the Λ closure of δ(S, σ).Another way to think of what we are doing is that we are removing the arcs
labelled with Λ, and adding ‘direct’ arcs labelled with a symbol.
56 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
9.25 F ′ the new set of accepting states
We might initially guess that F ′ = F , but this is wrong.Suppose q0 is not an accepting state, but there is some accepting state which
can be reached by a sequence of Λ transitions from q0 i.e. the initial state is notan accepting state but the NFA with Λ transitions accepts the empty string.
If this is the case then F ′ = F ∪ {q0}, otherwise F ′ = F .
9.26 Example: an NFA equivalent to M6
M6 = (Q,Σ, δ, q0, F ) as above.M12 = (Q′,Σ′, δ′, q′0, F
′) where Q′ = Q, Σ′ = Σ, q′0 = q0.There is no accepting state reachable on Λ transitions from S1, so F ′ = F .We construct δ′:δ′ 0 1S1 {S2, S4} {S1, S2, S3, S4, S5}S2 {S4} {S1, S2, S3, S4, S5}S3 {S2, S4} {S1, S2, S3, S4, S5}S4 {S1, S2, S4, S5} {S1, S2, S4, S5}S5 {S2, S4} {S1, S2, S3, S4, S5}
9.27 Graphical representation of M12
9.28 Regular expression to NFA with Λ transi-
tions
Now we will show:
• any language which is describable by a regular expression is describableby a non-deterministic finite automaton with Λ moves,
• hence any language which is describable by a regular expression is describ-able by a non-deterministic finite automaton,
• hence any language which is describable by a regular expression is describ-able by a finite automaton.
9.29 Proof outline
The proof is by induction, and its algorithmic content allows us to write aprogram which takes a regular expression and constructs an automaton whichrecognises the same language.
Recall that the regular expressions are defined as follows: If Σ is an alphabetthen:
9.30. BASE CASES 57
• the empty language ∅ and the empty string Λ are regular expressions
• if r ∈ Σ then r is a regular expression
• if r and s are regular expressions then so are r + s, rs, r∗
9.30 Base cases
The base cases are:
• the empty language ∅
• the empty string Λ
• if r ∈ Σ then r
In these cases we just present an appropriate automaton.
9.31 Induction steps
The induction steps are:
• r + s
• rs
• r∗
In these cases we assume that we have automata to accept r and s and wemake use of these machines.
9.32 Proof
Base case: ∅The following NFA with Λ transitions accepts no strings.
M∅ = (Q,Σ, δ, q0, F ) where:
Q = {S1} Σ = {} q0 = S1 δ(S1,Λ) = {} F = {}
Informally: a machine with no accepting states (and not much else).
Base case: ΛThe following machine accepts the empty string. MΛ = (Q,Σ, δ, q0, F )where:
Q = {S1} Σ = {} q0 = S1 δ(S1,Λ) = {} F = {S1}
Informally: a machine whose start state is the only accepting state.
58 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
Base case: r, r ∈ ΣThe following machine accepts Language(r), for some r ∈ Σ.
Mr = (Q,Σ, δ, q0, F ) where:
Q = {S1, S2} q0 = S1 F = {S2}
δ r σ 6= r ΛS1 {S2} {} {}S2 {} {} {}
Informally: a machine with one accepting state, which can only be arrivedat on a r.
Induction step: r∗
Assume that we have an automaton Mr which accepts Language(r) andshow how to make an automaton which accepts Language(r∗).
Informally: we need a way to allow ourselves to go through Mr 0, 1, 2, 3,. . . times. We make the initial state of Mr an accepting state, and add aΛ transition from each accepting state to the initial state.
Let Mr = (Q,Σ, δ, q0, F ) be an NFA with Λ transitions which acceptsLanguage(r). Then Mr∗ = (Q′,Σ′, δ′, q′0, F
′), where:
Q′ = Q Σ′ = Σ q′0 = q0 F ′ = F ∪ {q0}δ′(S, σ) = δ(S, σ)δ′(S,Λ) = δ(S,Λ), S 6∈ Fδ′(S,Λ) = δ(S,Λ) ∪ {q0}, S ∈ F
is an NFA with Λ transitions which accepts Language(r∗).
Induction step: r + sAssume that we have automata Mr and Ms which accept Language(r)and Language(s) and show how to combine them into a machine to acceptLanguage(r + s).
Informally: we need a way to allow ourselves to go through either Mr orMs. We can do this if we add a new start state with Λ transitions to thestart states of Mr and Ms. The accepting states of the new machine willbe the union of the accepting states of Mr and Ms. The two machinesmay make use of different alphabets, and we need to take care over this.Let Mr = (Qr,Σr, δr, q0r, Fr) and Ms = (Qs,Σs, δs, q0s, Fs).
is an NFA with Λ transitions which accepts Language(r + s).
Induction step: rsAssume that we have automata Mr and Ms which accept Language(r)and Language(s) and show how to combine them into a machine to acceptLanguage(rs).
Informally: we need a way to allow ourselves to go through Mr and thenMs. We start at the start state of Mr. From each accepting state of Mr
we add a Λ transition to the start state of Ms. The accepting states ofthe new machine will be the accepting states of Ms. The two machinesmay make use of different alphabets, and we need to take care over this.
Let Mr = (Qr,Σr, δr, q0r, Fr) and Ms = (Qs,Σs, δs, q0s, Fs).
Then Mr+s = (Q′,Σ′, δ′, q′0, F′), where:
Q′ = Qr ∪Qs Σ′ = Σr ∪ Σs q′0 = q0r F ′ = Fs
δ′(S, σ) = δr(S, σ), S ∈ Qr, σ ∈ Σr
δ′(S, σ) = {}, S ∈ Qr, σ 6∈ Σr
δ′(S,Λ) = δr(S,Λ) ∪ {q0s}, S ∈ Fr
δ′(S,Λ) = δr(S,Λ), S ∈ Qr, S 6∈ Fr
δ′(S, σ) = δs(S, σ), S ∈ Qs, σ ∈ Σs
δ′(S, σ) = {}, S ∈ Qs, σ 6∈ Σs
δ′(S,Λ) = δs(S,Λ), S ∈ Qs
is an NFA with Λ transitions which accepts Language(rs).
9.33 Proof summary
We have shown that we can perform the construction in each of the base casesand in each of the inductive steps. Hence the proof is finished.
We can use this proof to give us an algorithm to recursively construct anNFA with Λ transitions, given a regular expression.
60 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
9.34 Example
We take the regular expression a∗ + ba through NFAΛ, NFA, to FA.
a ?>=<89:;−a // ?>=<89:;+
b ?>=<89:;−b // ?>=<89:;+
a∗ GFED@ABC+−a
** ?>=<89:;+Λ
ll
ba ?>=<89:;−b ///.-,()*+ Λ // ?>=<89:;+
a // ?>=<89:;+
a∗ + ba ?>=<89:;+a
** ?>=<89:;+Λ
jj
?>=<89:;−
Λ 66mmmmmm
Λ ((QQQQQQQ
/.-,()*+ b ///.-,()*+ Λ // ?>=<89:;+a // ?>=<89:;+
?>=<89:;+a
** ?>=<89:;+Λ
jj
?>=<89:;−
Λ 66mmmmmm
Λ ((QQQQQQQ
/.-,()*+ b ///.-,()*+ Λ // ?>=<89:;+a // ?>=<89:;+
Eliminate Λ:
/.-,()*+ a //
a
[[?>=<89:;+
a
hh
a
��
GFED@ABC+−
a
<<yyyyyyyy
a((
b
((QQQQQQQQQQQQQQQb
''/.-,()*+b
///.-,()*+a
77/.-,()*+
a// ?>=<89:;+
?>=<89:;2a //
a
YY
GFED@ABC3+
ajj
a
��
ONMLHIJK1+−
a
<<xxxxxxxx
a((
b
))RRRRRRRRRRRRRRRRb
((?>=<89:;4b
//?>=<89:;5
a
66?>=<89:;6 a
// GFED@ABC7+
9.34. EXAMPLE 61
To FA: ONMLHIJK23+
a
��
ONMLHIJK1+−
a88ppppppp
b &&NNNNNNNN
?>=<89:;4b
// GFED@ABC56 a// GFED@ABC7+
ONMLHIJK23+
a
��
ONMLHIJK1+−
a88ppppppp
b &&NNNNNNNN
?>=<89:;4b
// GFED@ABC56 a// GFED@ABC7+
Simplify: ONMLHIJK23+
a
��
ONMLHIJK1+−
a88ppppppp
b &&NNNNNNNN
GFED@ABC56 a// GFED@ABC7+
62 CHAPTER 9. NON-DETERMINISTIC FINITE AUTOMATA
Chapter 10
Kleene’s theorem
10.1 A Theorem
Theorem:Every language accepted by an FA is generated by a regular expression.
Proof:omitted, due to industrial action.
Once again, the proof is constructive: it gives us an algorithm which, given aFA, constructs a regular expression that generates the same language.
We have now established that, in terms of the languages that they can beused to describe, all of the following are equivalent:
• regular expressions
• finite automata
• non-deterministic finite automata
• non-deterministic finite automata with Λ transitions
10.2 Kleene’s theorem
Previously we gave a definition of a regular language as one which was describedby a regular expressions.
We can re-formulate the equivalence of regular expressions, FA’s and NFA’sas:
Theorem 8 (Kleene)A language is regular iff it is accepted by a FA.
A language is regular iff it is accepted by an NFA.
A language is regular iff it is generated by a regular grammar.
63
64 CHAPTER 10. KLEENE’S THEOREM
Note: iff is an abbreviation of if, and only if.Cohen characterises Kleene’s theorem as:
the most important and fundamental theorem in the theory of finiteautomata
Chapter 11
Closure Properties ofRegular Languages
11.1 Closure properties of regular language
We will now show some closure properties of the set of regular languages. Wewill show that
• the complement of a regular language is regular
• the union of two regular languages is regular
• the concatenation of two regular languages is regular
• the Kleene closure of a regular language is regular
• the intersection of two regular languages is regular
11.2 Formally
If L1 and L2 are regular languages then so are:
• L1
• L1 ∪ L2
• L1L2
• L∗1
• L1 ∩ L2
We use Kleene’s theorem to prove these.
65
66 CHAPTER 11. CLOSURE PROPERTIES OF REGULAR LANGUAGES
11.3 Complement
Theorem 9 If L1 is a regular language, then so is L1.
We have already shown this, in a previous lecture, where we showed that wecould construct a FA to accept L1, given a FA which accepted L1.
If L1 is regular then there is a FA which accepts it. If there is an FA whichaccepts L1 then L1 is regular.
11.4 Union
Theorem 10 If L1 and L2 are regular languages then so is L1 ∪ L2.
The language L1 ∪ L2 is the set of all strings in either L1 or L2.If L1 is regular then there is a regular expression r which describes it.If L2 is regular then there is a regular expression s which describes it.Then the regular expression r + s describes L1 ∪ L2.Since L1 ∪ L2 is described by a regular expression it is regular.
11.5 Concatenation
Theorem 11 If L1 and L2 are regular languages then so is L1L2.
The language L1L2 is the set of all strings which consist of a string from L1
follwed by a string from L2.If L1 is regular then there is a regular expression r which describes it.If L2 is regular then there is a regular expression s which describes it.Then the regular expression rs describes L1L2.Since L1L2 is described by a regular expression it is regular.
11.6 Kleene closure
Theorem 12 If L1 is a regular language then so is L∗1.
The language L∗1 is the set of all strings which are (possibly empty) sequences
of strings in L1.If L1 is regular then there is a regular expression r which describes it.Then the regular expression r∗ describes L∗
1.Since L∗
1 is described by a regular expression it is regular.
11.7 Intersection
Theorem 13 If L1 and L2 are regular languages then so is L1 ∩ L2.
11.8. SUMMARY OF THE PROOFS 67
Note: for any sets A and B: A ∩B = A ∪BSince L1 is regular so is L1 Since L2 is regular so is L2
Since L1 and L2 are regular so is L1 ∪ L2
Since L1 ∪ L2 is regular so is L1 ∪ L2
11.8 Summary of the proofs
We could have performed all these proofs via FA’s, NFA’s or regular grammarsif we had wanted to.
We have now shown that what appear to be very different ways to definelanguages are all equivalent, and moreover that the class of languages that theydefine is closed under various operations. We have also shown that all finitelanguages are regular.
The next obvious question is:
• are there any languages which are not regular?
We will answer this question next.
68 CHAPTER 11. CLOSURE PROPERTIES OF REGULAR LANGUAGES
Chapter 12
Non-regular languages
12.1 Non-regular Languages
So far we have seen one sort of abstract machine, the finite automaton, and thesort of language that this sort of machine can accept, the regular languages.
We will now show that there are languages which cannot be accepted byfinite automata.
Outline of Proof:
Suppose we have a language L and we want to show it is non-regular.
A language is non-regular just when it is not regular.
As a general rule of logic if we wish to show ¬P we assume P and derive acontradiction.
Hence, to show L is not regular we must show that a contradiction followsif we assume that L is regular.
When we are trying to derive a contradiction from the assumption that L isregular we usually make use of Kleene’s theorem.
If L is regular then there is a FA which accepts L.
We have already shown that all finite languages are regular, so if we are toshow that L is non-regular then L had better be infinite.
Note: there are lots of infinite regular languages:e.g. Language((1 + 0)∗).
Not all infinite languages are non-regular, but all non-regular languages areinfinite.
The general technique is to show that there is some string which must beaccepted by the FA, but which is not in L.
A FA has, by definition, a finite number of states.
Any sufficiently long string which is in L will trace a path through any FAwhich accepts L which visits some state more than once.
We then attempt to use this fact to give examples of other strings which anyFA which accepts L will accept, but which are not in L.
However, if there are such strings, then no FA accepts L. The contradictsour assumption that L was regular.
69
70 CHAPTER 12. NON-REGULAR LANGUAGES
Hence L is not regular, i.e. L is non-regular.
Comments on this argument
This argument is perfectly good, but it glosses over the fact that we may haveto do some thinking to show that there are strings which do the trick for us.
And, of course, we may be attempting to show that some regular languageis not regular. In this case we will fail!
12.2 Pumping lemma
Theorem 14 (Pumping lemma) If L is a regular language then there is a
number p, such that (∀s ∈ L)(|s| ≥ p ⊃ s = xyz), where:
1. (∀i ≥ 0)xyiz ∈ L
2. |y| > 0
3. |xy| ≤ p
We call p the pumping length.
(∀s ∈ L)(|s| ≥ p ⊃ s = xyz) reads ‘for every string s in L, if s is at least aslong as the pumping length, then s can be written as xyz.
12.3 Pumping lemma informally
The pumping lemma tells us, that for long enough strings s ∈ L, s can bewritten as xyz such that:
• y is not Λ and
• xz,
• xyz,
• xyyz,
• xyyyz,
• xyyyyz,. . . are all in L.
We say that we can ‘pump’ s and still get strings in L.
12.4. PROVING THE PUMPING LEMMA 71
12.4 Proving the pumping lemma
We now give a formal proof of the pumping lemma.
We have three things to prove corresponding to conditions 1, 2 and 3 in thepumping lemma. Let:
• M = (Q,Σ, δ, q0, F ) be a FA which accepts L,
• p be the number of states in M (i.e. p = Cardinality(Q))
• s = s1s2 . . . sn−1sn be a string in L such that n ≥ p
• r1 = q0
• ri+1 = δ(ri, si), 1 ≤ i ≤ n
Then the sequence r1r2 . . . rnrn+1 is the sequence of states that the machinegoes through to accept s. The last state rn+1 is an accepting state.
This sequence has length n + 1, which is greater than p. The pigeonhole
principle tells us that in the first p+ 1 items in r1r2 . . . rnrn+1 one state mustoccur twice.
We suppose it occurs first as rj and second as rl.
Notice: l 6= j, and l ≤ p+ 1.
Now let:
• x = s1 . . . sj−1
• y = sj . . . sl−1
• z = sl . . . sn
So:
• x takes M from r1 to rj
• y takes M from rj to rj
• z takes M from rj to rn+1
Hence M accepts xyiz, i ≥ 0. Thus we have shown that condition 1 of thepumping lemma holds.
Because l 6= j we know that |y| 6= 0. Thus we have shown that condition 2of the pumping lemma holds.
Because l ≤ p+1 we know that |xy| ≤ p. Thus we have shown that condition3 of the pumping lemma holds.
Hence the pumping lemma holds.
72 CHAPTER 12. NON-REGULAR LANGUAGES
12.5 A non-regular language
As an example we will show that {0n1n|n ≥ 0} is not a regular language.We begin the proof by assuming that this is a regular language, so there is
some machine N which accepts it.Hence, by the pumping lemma, there must be some integer k, such that the
string 0k1k, can be pumped to give a string which is also accepted by N .We let xyz = 0k1k, and show that xyyz is not in {0n1n|n ≥ 0}There are three cases to consider:
1. y is a sequence of 0s
2. y is a sequence of 0s followed by a sequence of 1s
3. y is a sequence of 1s
In case 1 xyyz will have more 0s than 1s, and so xyyz 6∈ L.In case 3 xyyz will have more 1s than 0s, and so xyyz 6∈ L.In case 2 xyyz will have two occurrences of the substring 01, and so xyyz 6∈ L.So in each case the assumption that {0n1n|n ≥ 0} is regular leads to a
contradiction.So {0n1n|n ≥ 0} is not regular.
Comment
Clearly we can write an algorithm to decide whether a string is in {0n1n|n ≥ 0}.This algorithm cannot be represented by a finite state machine. Hence theremust be more powerful abstract machines than FA’s.
12.6 Pumping lemma re-cap
If L is a regular language then there is a number p, such that (∀s ∈ L)(|s| ≥p ⊃ s = xyz), where:
1. (∀i ≥ 0)xyiz ∈ L
2. |y| > 0
3. |xy| ≤ p
12.7 Another non-regular language
We will show that the language
L1 = {w|w has the same number of 0′s as 1′s}
is non-regular.
12.8. AND ANOTHER . . . 73
We start by assuming that L is regular. Then we will show that there is astring in L which cannot be pumped. We let p be the pumping length.
Which string to choose?One candiate is 0p1p, where p is the pumping length. A similar string worked
for us before. However, it appears that this string can be pumped. Suppose wetake x and z to be Λ, and y to be 0p1p.
Then:
xz = Λ
xyz = 0p1p
xyyz = 0p1p0p1p
xyyyz = 0p1p0p1p0p1p
xy . . . yz = . . .
So, no contradiction here.All is not lost, however. Condition 3 in the pumping lemma tells us that
we can restrict attention to the case where |xy| ≤ p, where p is the pumpinglength.
If we split 0p1p under this condition then y must consist of only 0’s.Now xyiz, i ≥ 0 will only have the same number of 0’s and 1’s when i = 1.So the pumping lemma has told us that we must be able to split 0p1p in a
way which leads to a contradiction.So L1 is not regular.
12.8 And another . . .
We will show that the language L2 = {ww|w ∈ {0, 1}∗} is non-regular.Once again we assume that L2 is regular and use the pumping lemma to
obtain a contradition.As usual we let p be the pumping length.We can’t use the string 0p1p because it is not in the language. Why not try
0p0p? It is in the language, and it looks similar.Notice that 0p0p = 02p.Alas however we can find a way to pump 0p0p and stay in L2, even taking
into account condition 3.Suppose we take x to be Λ, and y to be a string of 0’s such that |y| ≤ p and
|y| is even.Then xyiz, i ≥ 0 will always be in L2.So, no contradiction.Note that every word in L2 has two equal left and right parts.What is happening when we pump the string is that we are adding symbols
to the left part.Then, however, we are allowing ourselves to move half of them into the right
part.
74 CHAPTER 12. NON-REGULAR LANGUAGES
This results in a string with two equal left and right parts once again.What we need to do is to make sure that we can’t do this rearrangement.Consider the string 10p10p.The pumping lemma tells us that we should be able to split this string up
into xyz, such that y is just 0’s and xyiz, i ≥ 0 will be in L2.Now we have our contradiction: xyiz is only in L2 for i = 1.
12.9 Comment
The moral of this story is that we will sometimes have to think a bit to find astring which will allow us to find a contradiction.
Chapter 13
Regular languages:summary
13.1 Regular languages: summary
We have covered quite a lot of material in this section of the paper, and mostof it has been done carefully, and in some detail.
This material needs to be presented with some care: all the pieces fit togetherrather neatly, and much of the understanding depends on seeing how the delicatemechanism works.
As we have worked through the material we have taken a ‘bottom-up’ ap-proach, now we will take a top down approach.
13.2 Kleene’s theorem
Kleene’s thoerem is the most important result. Why?
A grammar is a way to generate strings in a language.
An automaton provides us with a way to recognise strings in a language.
Kleene’s theorem neatly relates the strength of the machine with the formof the grammar. It is not at all obvious that such a neat relationship shouldhold.
Second, Kleene’s theorem tells us that the deterministic and the non-deterministicvariants of finite automata have just the same power.
Again it is not at all obvious that this should be the case.
It is important not just to know Kleene’s theorem as a fact, but also to knowwhy Kleene’s theorem holds. In other words we need to know how the proofsgo. Why?
75
76 CHAPTER 13. REGULAR LANGUAGES: SUMMARY
First, the regular languages are only one class of language. There are similarresults about other classes of language. Understanding how we showed Kleene’stheorem helps us understand the properties of these classes too.
Second, the proofs actually let us construct recognisers from generators,which turns out to be useful in itself.
Third, much of the mathematics that is used in these proofs is used elsewherein computer science.
In fact the bulk of this section of the course was devoted to setting up themechanism required to prove Kleene’s theorem.
Part II
Context Free Languages
77
Chapter 14
Introducing Context FreeGrammars
14.1 Beyond Regular Languages• Many languages of interest are not regular.
– {anbn | a ∈ N} is not regular.
FA cannot “count” higher than the number of states.
{ααR | α ∈ A∗} is not regular when |A| > 1.
FA can’t match symbols in α with those in αR
– Arithmetic expressions with brackets is not regular.
FA can’t check that the brackets match
Now consider Context Free Languages,defined using Context Free Grammars.
• What is a Context Free Grammar (CFG)?
• How can we define languages using CFGs?
• How can we recognise Context Free Languages?
• What is the relationship between Context Free Languages and RegularLanguages?
14.2 Sentences with Nested Structure
2 + 3 × ( 4 + 5 × 7 )
the boy hit the big ball
79
80 CHAPTER 14. INTRODUCING CONTEXT FREE GRAMMARS
while c do while d do S
if (c) if (d) S; else T ;
if (c) if (d) S; else T ;
14.3 A Simple English GrammarA sentence is a noun phrase followed by a verb
phrase.S → NP VP
A noun phrase is a determiner followed by either anoun or an adjective phrase.
NP → D NNP → D AP
An adjective phrase is an adjective foll. by a noun. AP → A N
A verb phrase is a verb followed by a noun phrase. VP → V NP
A determiner is an article, e.g. “the”. D → the
A noun is a word denoting an object, e.g. “ball” or“boy”.
N → boyN → ball
An adjective is a word denoting a property, e.g.“big”.
A → big
A verb is a word denoting an action, e.g. “hit”. V → hit
14.4. PARSE TREES 81
14.4 Parse Trees
the boy hit the big ball
D N V D A N
AP
NP
NP VP
S�
��
PPPPP
��
��
��
QQ
��
��
QQ
� @
��
����
BBBBBB
14.5 Context Free Grammars
Components of a grammar:
• Terminal symbols: “the”, “boy”, “ball”, etc.The words which actually appear in sentences.
• Nonterminal symbols : “S”, “NP”, “VP”, “D”, “N ” etc.Names for components of sentences.Never appear in sentences.Distinguished nonterminal (S) identifies the language being defined.
• A finite set of Productions.
A production has the form:
nonterminal → definition
where definition is a string (possibly empty) of terminal and/or nonterminalsymbols.
“→” is a metasymbol.It is part of the notation (metalanguage).
82 CHAPTER 14. INTRODUCING CONTEXT FREE GRAMMARS
14.6 Formal definition of CFG
A Context Free Grammar (CFG) is a 4-tuple G = (Σ, N, S, P ) where:
• Σ is a finite set of terminal symbols (the alphabet).
• N is a finite set of nonterminal symbols, disjoint from Σ.
• S is a distinguished member of N , called the start symbol.
• P is a finite set of production rules of the form α→ β, where:
– α is a nonterminal symbol: α ∈ N ,
– β is a (possibly empty) string of terminal and/or nonterminal sym-bols: β ∈ (N ∪ Σ)∗.
Note: Some presentations of context free grammars do not allow rules withempty right-hand sides. We will discuss this restriction later.
Chapter 15
Regular and Context FreeLanguages
15.1 Regular and Context Free Languages
Kleene’s theorem tells us that all the following formalisms are equivalent inpower:
• Regular expressions
• (Deterministic) Finite Automata
• Nondeterministic Finite Automata
• Nondeterministic Finite Automata with Λ Transitions
Now we have a new formalism: the Context Free Grammar.
How does its power compare with those above?
How is the class of Context Free Languages related to the class of regularlanguages?
CF = Reg? CF ⊆ Reg? CF ⊇ Reg? CF ∩ Reg = ∅?
15.2 CF ∩ Reg 6= ∅
The language EVEN-EVEN contains every string over the alphabet {a, b} withan even number of as and an even number of bs.
We saw earlier that it may be described by a r.e.:EVEN-EVEN = Language((aa + bb + (ab + ba)(aa + bb)∗(ab + ba))∗)
83
84 CHAPTER 15. REGULAR AND CONTEXT FREE LANGUAGES
Here is a context-free grammar for EVEN-EVEN:
S → S S B → aaS → B S B → bbS → S B U → abS → Λ U → baS → U S U
(Proof : Cohen, p236)
So, at least one language is both context free and regular.
15.3 CF 6⊆ Reg
The language EQUAL contains every string over the alphabet {a, b} with anequal number of as and bs.
We proved earlier that EQUAL is not regular (pumping lemma).
Here is a context-free grammar for EQUAL:
S → aB A→ a B → bS → bA A→ aS B → bS
A→ bAA B → aBB
(Proof : Cohen, p239)
So, at least one language is context free but is not regular.
15.4 CF ⊇ Reg
In fact, every regular language is context free. To prove this, we show how toconvert any FA into a context free grammar.
(We could have chosen to convert regular expressions, or NFAs into CFGsinstead, since Kleene showed us they were all equivalent.)
• The alphabet Σ of the CFG is the same as the alphabet Σ of the FA.
• The set N of nonterminals of the CFG is the set Q of states of the FA.
• The start symbol S of the CFG is the start state q0 of the FA.
• For every X,Y ∈ Q, a ∈ Σ, there is a production X → aY in the CFG ifand only if there is a transition X
a−→ Y in the FA.
• For every X ∈ F in the FA there is a production X → Λ in the CFG.
(Proof : Cohen p260)
15.5. EXAMPLE 85
15.5 Example
ONMLHIJKA−,+
a
��
b// GFED@ABCB+
b//
a
EE
?>=<89:;C b,att GFED@ABCD+ b,a
mm
The following CFG accepts the same language as the above FA:
Σ = {a, b}
N = {A,B,C,D}
S = A
P =
A→ aD B → aD C → bC D → aDA→ bB B → bC C → aC D → bDA→ Λ B → Λ D → Λ
15.6 Regular Grammars
Grammars produced by the above process are called regular grammars.Every production of a regular grammar has one of the forms:
N1 → T N2
N1 → T
(where N1, N2 ∈ N , and T ∈ Σ∗).That is, the RHS of every production consists of a (possibly empty) sequence
of terminals, possibly followed by a single nonterminal.To prove that the class of languages accepted by regular grammars is exactly
the class of regular languages, we need to show how to transform any regulargrammar into an equivalent FA.
The transformation is similar to the one (which we omitted) turning FAsinto regular expressions. See Cohen (p263) if you are interested in it.
15.7 Parsing using Regular Grammars
Because regular grammars are so much like finite automata, it is easy to generate(or parse) a sentence using a regular grammar.
S → aS (1) T → bT (3) U → aU (5)S → T (2) T → bU (4) U → a (6)
To generate the string aabba, we can go through the following steps:
S1⇒ aS
1⇒ aaS
2⇒ aaT
3⇒ aabT
4⇒ aabbU
6⇒ aabba
This is called a derivation.
86 CHAPTER 15. REGULAR AND CONTEXT FREE LANGUAGES
15.8 Derivations
Consider the CFG (Σ, N, S, P ).
A sentential form is a (possibly empty) string made up of nonterminals andterminals: that is, a string of type (Σ ∪N)∗.
A derivation is a sequence α0 ⇒ · · · ⇒ αn in which:
• Each αi is a sentential form.
• α0 is the start symbol, S.
• αn is a string of type Σ∗ (i.e., there are no nonterminals left)
• For each pair αi ⇒ αi+1, we have:
– αi has the form β n δ for some nonterminal n ∈ N and sententialforms β and δ;
– there is a production (n → γ) ∈ P ; and
– αi+1 has the form β γ δ.
15.9 Derivations in Regular Grammars
A semiword is a sentential form of the restricted form Σ∗N : that is, a (possiblyempty) string of terminals followed by a single nonterminal.
In a regular grammar, every production has as its right-hand side either asemiword or a word.
In any derivation, α0 is a semiword: it is a single nonterminal.
If some sentential form αi is a semiword, and we use a production from aregular grammar, αi+1 will also be either a semiword or a word.
Thus, to find a derivation using a regular grammar, we simply select a pro-duction whose left-hand-side matches the (single) nonterminal in our sententialform, and repeat until a rule with no nonterminals on its right-hand side is used,at which stage the result is a word.
15.10 Derivations in Arbitrary CFGs
With regular grammars, every sentential form in a derivation is a semiword.
With arbitrary CFGs, productions can have multiple nonterminals on theright-hand side, and so sentential forms in derivations have multiple nontermi-nals too. Further, any terminals need not occur first in the sentential forms.
It is no longer a matter of selecting a production to match our nonterminal;we must first decide which nonterminal to match.
15.11. PARSE TREES 87
(1) (2) (3) (4) (5) (6)S → XS Y S → a X → bX X → c Y → d Y → e
S1⇒ XSY S
1⇒ XSY
3⇒ bXSY
1⇒ XXSY Y
4⇒ bcSY
5⇒ XXSdY
1⇒ bcXSY Y
3⇒ XbXSdY
3⇒ bcbXSY Y
3⇒ bXbXSdY
3⇒ bcbbXSY Y
2⇒ bXbXadY
4⇒ bcbbcSY Y
3⇒ bXbbXadY
2⇒ bcbbcaY Y
4⇒ bcbbXadY
5⇒ bcbbcadY
6⇒ bcbbXade
6⇒ bcbbcade
4⇒ bcbbcade
The leftmost nonterminal in a sentential form is the first nonterminalthat we encounter when scanning left to right.
A leftmost derivation of word w from a CFG is a derivation in which, ateach step, a production is applied to the leftmost nonterminal in the currentsentential form.
Of the two derivations of the word bcbbcade on the previous slide, the firstis a leftmost derivation; the second is not.
Theorem: Any word that can be generated from a given CFG by some deriva-tion can be generated by a leftmost derivation.
15.11 Parse Trees
We saw parse trees informally in our first lecture.We now want to define parse trees more formally, by looking at the relation-
ship to derivations.Given G = (Σ, N, S, P ) and w ∈ V ∗
T , a parse tree for w from G is an ordered,labelled tree such that:
• Each leaf node is labelled with an element of Σ.
• Each non-leaf node is labelled with an element of N .
• The root is labelled with the start symbol, S.
• For each non-leaf node, n, if α is the label on n and γ1, · · · , γk are thelabels on its children, then α→ γ1 · · ·γk is a rule in P .
• The fringe of the tree is w.
88 CHAPTER 15. REGULAR AND CONTEXT FREE LANGUAGES
15.12 Derivations and parse trees
A partial parse tree is an ordered, labelled tree which is like a parse tree,except that it may have nonterminals or terminals at its leaves.
The fringe of a partial parse tree is a sentential form.A derivation α0 ⇒ · · · ⇒ αn corresponds to a sequence of partial parse trees
t0, . . . , tn such that the fringe of ti is αi, and each ti+1 is obtained from t byreplacing a single leaf node (labelled with a nonterminal symbol n) by a treewhose root is n.
Since αn ∈ Σ∗, the final partial parse tree tn is a parse tree for the word αn,corresponding to the derivation α0, . . . , αn.
15.13 AmbiguityA word may have several different parse trees, each corresponding to a differentleftmost derivation.
(1) (2) (3) (4) (5)S → XY X → b X → Xa Y → b Y → aY
S1⇒ XY2⇒ bY5⇒ baY5⇒ baaY4⇒ baab
S1⇒ XY3⇒ XaY2⇒ baY5⇒ baaY4⇒ baab
S1⇒ XY3⇒ XaY3⇒ XaaY2⇒ baaY4⇒ baab
A grammar in which there are sentences with multiple parse trees is calledambiguous.
(1) (2) (3, 4, 5)E → E“+”E E → E“×”E E → “1”|“2”|“3”
1 + 2 × 3
E E E� @
E�
��
� @E
�����
1 + 2 × 3
E E E� @
E��
SS
S
E
CCCCC
It is often possible to transform an ambiguous grammar so that it unam-biguously defines the same language.
(1) (2) (3) (4) (5, 6, 7)E → E“+”T E → T T → T “×”F T → F F → “1”|“2”|“3”
1 + 2 × 3
F F F
�
JJT T
E T�� PPE
�����
1 + 2 × 3
F F F
T T
E��
JJ
E
??????
Chapter 16
Normal Forms
16.1 Lambda Productions
It is sometimes convenient to include productions of the form A→ Λ in a CFG.For example, here are two CFGs, both defining the language described by
the regular expression a∗b+:
S → AB S → ABA→ Λ S → BA→ aA A→ aAB → b A→ aB → bB B → b
B → bB
The grammar on the left is shorter and easier to understand; however, thelambda production A→ Λ causes problems for parsing.
16.2 Eliminating Λ Productions
Suppose a CFG has a production A→ Λ, which we wish to remove.Suppose that A is not the start symbol.Any derivation that uses this production must also have used some produc-
tion B → β A δ, for some nonterminal B (possibly A itself).If we add to the grammar the single production B → β δ, this instance of
the Lambda production is unnecessary, but the language accepted is the same.If we carry out that process for every occurrence of A on the right-hand side
of any production, we may eliminate A→ Λ altogether.For example, we take:
S → AB, A→ Λ, A→ aA, B → b, B → bB
and, noting that A occurs on the right-hand side of two productions, we add:
S → B, A→ a
89
90 CHAPTER 16. NORMAL FORMS
Now, A → Λ is redundant, and we may remove it; the result is as givenbefore.
However, the process may run into some problems.
16.3 Circularities
First, the process of eliminating a Λ production may introduce new ones. Con-sider:
S → aTU, T → Λ, T → U, U → T
This is a (rather long-winded) grammar for the language consisting of just thestring a.
To remove T → Λ, we must add S → aU and U → Λ; the result is:
S → aTU, S → aU, T → U, U → T, U → Λ
Then, to remove U → Λ, we must add S → aT and T → Λ: and so we areno better off than where we started.
The solution is to remove the potential Lambda productions for T and Uconcurrently.
That is, we note that U , though not directly defined by a Lambda produc-tion, is nullable: there is a derivation
U ⇒ · · · ⇒ Λ
T is trivially nullable, because it has a Lambda production of its own.
We must add new productions for every possible nullable nonterminal on theright-hand side of any production.
S → aTU, T → Λ, T → U, U → T
T and U are both nullable, so we must add:S → aU , accounting for T ⇒ Λ;S → aT , accounting for U ⇒ Λ; andS → a, accounting for TU ⇒ Λ.
Now, we no longer need the Lambda productions, so we get
S → aTU, S → aU, S → aT, S → a, T → U, U → T
Aside: At this point, we could note that T and U are useless, since no derivationinvolving either of them can terminate.
They can thus be removed, leaving just S → a.
16.4. Λ PRODUCTIONS ON THE START SYMBOL 91
16.4 Λ Productions on the Start Symbol
Consider the CFG:
S → Λ, S → aS
It accepts the language described by the r.e. a∗.
If we attempt to remove S → Λ, we note that S is nullable, and add S → a,resulting in:
S → a, S → aS
However, this CFG now accepts Language(a+): the empty string is no longerallowed.
The same problem will arise any time S is nullable, not just when it isdirectly involved in a Lambda production.
We can make sure that the start symbol is never subjected to Lambda re-moval by first transforming our grammar so that the start symbol occurs exactlyonce, on the left-hand side of the first production.
Let G = (Σ, N, S, P ) be a CFG, with
{(S → α1), (S → α2), . . . , (S → αn)} ⊆ P
being all the productions for S; let S′ 6∈ N be a brand new nonterminal.
The CFG: G′ = (Σ, N ∪{S′}, S′, P ∪{S′ → S}) is equivalent to G, and maybe safely subjected to Lambda removal.
If S was nullable, the additional production S′ → Λ will be generated; ofcourse, this Lambda production must not be removed.
16.5 Example
S → cS, S → TU, T → Λ, T → aT, U → Λ, U → bTU
accepts the language L = Language(c∗a∗(ba∗)∗). Note that Λ ∈ L.
First introduce a new start symbol S′:
S′ → S, S → cS, S → TU, T → Λ, T → aT, U → Λ, U → bTU
Now, note that S, T , and U are all nullable, so add:
S′ → Λ, S → c, S → T, S → U,T → a, U → bT, U → bU, U → b
The result (with Λ productions removed) also accepts L.
92 CHAPTER 16. NORMAL FORMS
16.6 Unit Productions
Definition: A unit production is a production n1 → n2, where n1, n2 ∈ N .For example, consider S → A B, A→ B, B → b.The grammar accepts the language {bb}; the A nonterminal, with its unit
production A→ B, is merely a distraction, and the grammar could more infor-matively be written S → A A, A→ b.
Any leftmost derivation using a unit production n1 → n2 must include sen-tential forms σ1 = α n1 β and σ2 = α n2 β (α ∈ Σ∗, β ∈ (Σ ∪N)∗).
Since n2 is itself a nonterminal, the derivation must also include σ3 = α γ β,where (n2 → γ) ∈ P .
If we add the production n1 → γ, the unit production becomes unnecessary,and the derivation can go directly from σ1 to σ3.
However, once again we can get circularities.For example, if P contains both n1 → n2 and n2 → n1, we will indefinitely
replace one by the other.Instead, we use the following rule:
For every pair of nonterminals n1 and n2 such that n1 ⇒ · · · ⇒ n2,and for every nonunit production n2 → γ, add the production n1 →γ.
As long as all such replacements are done simultaneously, the unit produc-tions may safely be eliminated.
See Cohen (pp273ff) for discussion and an example.
16.7 Example
We continue with the example from Slide 36.
S′ → S | Λ S → cS | TU | c | T | UT → aT | a U → bTU | bT | bU | b
Direct unit productions are S′ → S, S → T , S → U .Indirectly, we also have S′ ⇒ T , S′ ⇒ U .
From S′ ⇒ S with S → cS | TU | c we get S′ → cS | TU | c.From S′ ⇒ T with T → aT | a we get S′ → aT | a.From S′ ⇒ U with U → . . . we get S′ → bTU | bT | bU | b.From S ⇒ T with T → aT | a we get S → aT | a.From S ⇒ U with U → . . . we get S → bTU | bT | bU | b.In summary:
S′ → Λ | cS | TU | c | aT | a | bTU | bT | bU | bS → cS | TU | c | aT | a | bTU | bT | bU | bT → aT | aU → bTU | bT | bU | b
Chapter 17
Recursive Descent Parsing
17.1 Recognising CFLs (Parsing)How can we determine whether a string w ∈ Σ∗ is in the language generated bya CFG, G = (Σ, N, S, P )?
We know that w ∈ L(G) iff there is a derivation for w from S – equivalently,if there is a parse tree with root S and fringe w.
To determine whether w ∈ L(G), we try to build a parse tree for w.
Top-down: Build parse tree starting from the root. At each step, choose:– A nonterminal, N , in the fringe to expand.– A rule with N as its lhs to apply.
Bottom-up: Build parse tree starting from the leaves and working upwards tothe root. At each step, choose:– A substring, α, of the current string to reduce.– A rule with α as its rhs to apply.
17.2 Top-down Parsing
Consider the following grammar for nested lists of numbers:
L → “(” “)” | “(” B “)” (1,2)B → E | E “,” B (3,4)E → L | 1 | 2 | 3 (5,6,7,8)
Let’s try to parse the input (1,()).L
( 1 , ( ) )
93
94 CHAPTER 17. RECURSIVE DESCENT PARSING
17.3 Top-down Parsing
Here is the leftmost derivation:
L⇒ ( B )⇒ ( E , B )⇒ ( 1 , B )⇒ ( 1 , E )⇒ ( 1 , L )⇒ ( 1 , ( ) )
Working left to right: At each step, either expand the leftmost non-terminal,or match the leftmost terminal with an input symbol.
This is called LL(1) parsing.
17.4 Recursive Descent Parsing
Recursive Descent is a technique for building “one-off” LL(1) parsers.
• Parser is a set of mutually recursive procedures, with one procedure cor-responding to each non-terminal.
• Each procedure decides which rule to use and looks for the symbols in therhs of that rule.
• Matches the rhs from left to right.
• Must be able to choose rule by looking at next input.
17.5 Building a Parser for Nested Lists
• LL(1) grammar for nested lists:
List → “(” RestList (1)RestList → “)” | ListBody “)” (2), (3)ListBody → ListElt RestBody (4)RestBody → Λ | “,” ListBody (5), (6)ListElt → number | List (7), (8)
• Parser will have one procedure for each nonterminal: ParseList , ParseRestList ,etc.
• Input is a sequence of symbols, ss, recognised by a scanner, which alsoremoves white space.
The symbols used are:
17.6. PARSER FOR NESTED LISTS 95
Symbol Kind Symbollparensym “(”rparensym “)”commasym “,”numbersym (0|1|2|3|4|5|6|7|8|9)+
17.6 Parser for Nested Lists
Rule: List→ “(” RestList (1)
procedure ParseList (in out ss);begin
if head(ss) = lparensym thenss := tail(ss); ParseRestList(ss)
else Errorend ParseList
Rule: RestList→ “)” | ListBody “)” (2), (3)
procedure ParseRestList (in out ss);begin
if head(ss) = rparensym thenss := tail(ss)
elseParseListBody(ss);if head(ss) = rparensym then ss := tail(ss) else Error
end ParseList
Rule: ListBody → ListElt RestBody (4)
procedure ParseListBody(in out ss);begin
ParseListElt(ss);ParseRestBody(ss)
end ParseListBody
Rule: RestBody → Λ | “,” ListBody (5), (6)
procedure ParseRestBody(in out ss);begin
if head(ss) = commasym thenss := tail(ss);ParseListBody(ss)
elseskip
end ParseRestBody
Rule: ListElt→ number | List (7), (8)
96 CHAPTER 17. RECURSIVE DESCENT PARSING
procedure ParseListElt(in out ss);begin
if head(ss) = numbersym thenss := tail(ss);
elseParseList(ss)
end ParseListElt
17.7 Building a Parse Tree
How can we construct the parse tree?
• Insert code to collect the components corresponding to the RHS of therule applied.
• Add code to “apply rule” at end of code that checks each rule.
• Scanner returns symbol value as well as symbol kind.
• Parser procedure returns tree as well as consuming input.
if head(ss).type = numbersym thent := Tree(“ListElt”, 〈Leaf(head(ss).value)〉);ss := tail(ss)
elseParseList(ss, u);t := Tree(“ListElt”, 〈u〉)
end ParseListElt
17.9 LL(1) Grammars
Recursive Descent is an LL(1) parsing technique:
• Leftmost derivations
• Left-ro-right scanning of input
• 1 symbol lookahead.
This only works for some grammars:
Requirement 1:
Two productions for the same nonterminal cannot produce strings thatstart with the same terminal.
Requirement 2:
If a nonterminal can produce Λ, it cannot start with any terminal thatcan also follow it.
98 CHAPTER 17. RECURSIVE DESCENT PARSING
17.10 First and Follow sets
For any grammar (Σ, N, S, P ) and sentential form α ∈ (Σ ∪N)∗:
first(α) = {x ∈ Σ | α⇒ xψ for some ψ ∈ Σ∗}follow (α) = {x ∈ Σ | S ⇒ φαxψ for some φ, ψ ∈ (Σ ∪N)∗}
That is, first(α) is all those terminals that can appear first in any stringderived from α, andfollow (α) is all those terminals that can appear immediately after α in anysentential form derived from the start symbol.
Now, the LL(1) requirements are:Requirement 1: If N → α and N → β, then first(α) ∩ first(β) = ∅.Requirement 2: If N ⇒ Λ, then first(N) ∩ follow (N) = ∅.
Consider the grammar for arithmetic expressions:
E → T | E + T | E − T (1), (2), (3)T → F | T × F | T/F (4), (5), (6)F → id | (E) (7), (8)
Breaks LL(1) requirement (2), because S ⇒ Λ but first(S) = follow (S) ={a, b}.
If parser sees a as next symbol, cannot decide whether to do Sa⇒ Λa = aor Sa⇒ aSaa.
• Some CFLs can be parsed deterministically bottom up, but not top down.
E.g. { ax+y bx cy | x, y ≥ 0 }: S → Λ | a T b | a S c T → Λ | a T b
Breaks LL(1) condition (1). Can’t be left-factored. Top down parser can’ttell whether initial a will eventually match a b or a c.
100 CHAPTER 17. RECURSIVE DESCENT PARSING
Chapter 18
Pushdown Automata
18.1 Finite and Infinite Automata
L0 = {ambn | m,n ≥ 0} is regular: a∗b∗ is its regular expression.
ONMLHIJKS−0 Λ
//
a
MM
ONMLHIJKS+1
b
MM
L1 = {anbn | n ≥ 0} is not regular (pumping lemma), but it is context free:S → aSb | Λ is its CFG.
If we augment L0’s NFAΛ with a counter, a “super-NFA” can recognizeL1:
ONMLHIJKS−0 Λ
//
a; c:=c+1
MM
ONMLHIJKSc=01
b; c:=c−1
LL
L2 = {w1w2 | w1, w2 ∈ {a, b}∗} is regular: (a + b)∗(a + b)
∗is its regular
expression.
ONMLHIJKS−0 Λ
//
a
b
MM
ONMLHIJKS+1
a
b
MM
L3 = {wwR | w ∈ {a, b}∗} is not regular, but it is context free: S →aSa | bSb | Λ is its CFG.
If we augment L2’s NFAΛ with a stack, the machine can recognize L3:
ONMLHIJKS−0 Λ
//
a; s.push(a)
b; s.push(b)
MM
ONMLHIJKSs.isempty()1
a; s.pop()==a
��
b; s.pop()==b
OO
101
102 CHAPTER 18. PUSHDOWN AUTOMATA
A Pushdown Automaton is simply a NFA augmented with a stack.
18.2 Pushdown Automata
A PDA is a NFAΛ with a stack.
At each transition, we can:
• read a symbol x;
• pop a symbol y from the stack; and
• push a symbol z onto the stack.
We draw this as follows:
?>=<89:;x;y;z
// ?>=<89:;
Any of x, y, z may be Λ, indicating that nothing is read, popped, or pushedat that transition.
We label start states with − and final states with +, as before; but this time,a final state is accepting only if the stack is empty.
Our NPDA for L3 = {wwR} is now written:
ONMLHIJKS−0 Λ;Λ;Λ
//
a;Λ;a
b;Λ;b
MM
ONMLHIJKS+1
a;a;Λ
b;b;Λ
MM
A stack may also serve as a counter, by stacking and matching some arbitrarysymbol (say #); so L1 = {anbn} is:
ONMLHIJKS−0 Λ;Λ;Λ
//
a;Λ;#
ONMLHIJKS+1
b;#;Λ
L4 = {anb2n | n ≥ 0} (S → aSbb | Λ)
ONMLHIJKS−0
a;Λ;#++
Λ;Λ;Λ
BB
GFED@ABCS1
Λ;Λ;#
kkONMLHIJKS+
2
b;#;Λ
18.3. FORMAL DEFINITION OF PDA 103
L5 = {anb⌊n/2⌋ | n ≥ 0} (S → aT | T T → aaT b | Λ)
ONMLHIJKS−0
Λ;Λ;Λ//
a;Λ;#
ONMLHIJKS+1
Λ;#;Λ++ONMLHIJKS+
2b;#;Λ
kk
L6 = {ambncm+n | m,n ≥ 0} (S → aSc | T T → bT c | Λ)
ONMLHIJKS−0
Λ;Λ;Λ//
a;Λ;#
GFED@ABCS1Λ;Λ;Λ
//
b;Λ;#
ONMLHIJKS+2
c;#;Λ
L7 = {ambncm−n | m ≥ n ≥ 0} (S → aSc | T T → aT b | Λ)
ONMLHIJKS−0
Λ;Λ;Λ//
a;Λ;#
ONMLHIJKS+1
Λ;Λ;Λ//
b;#;Λ
ONMLHIJKS+2
c;#;Λ
18.3 Formal Definition of PDA
P = (Σ,Γ, Q, q0, F, δ)
• Σ is the alphabet of input symbols, which can appear in sentences recog-nized by the PDA.
• Γ is the alphabet of symbols that can appear on the stack: may or maynot be the same as Σ.
• Q is the set of states.
• q0 ∈ Q is the start state.
• F ⊆ Q is the set of final states.
• δ is the transition function, which will (nondeterministically) map a state,an input symbol (or Λ), and a stack symbol (or Λ) to a new state and anew sequence of stack symbols:
δ : Q× (Σ ∪ {Λ})× (Γ ∪ {Λ}) → 2Q×Γ∗
104 CHAPTER 18. PUSHDOWN AUTOMATA
18.4 Deterministic and Nondeterministic PDAs
PDAs may be deterministic or nondeterministic, much like finite acceptors.A deterministic PDA is one in which every input string has a unique path
through the machine.This means that at each state, it must be possible to deterministically decide
whether to take a transition that:
• reads a symbol from the input, but pops nothing from the stack (x; Λ; ?)
• reads no input, but pops a symbol from the stack (Λ; y; ?);
• both reads and pops simultaneously (x; y; ?); or
• neither reads nor pops (Λ; Λ; ?).
Consider again the PDA for L6 = {ambncm+n}:
ONMLHIJKS−0
Λ;Λ;Λ//
a;Λ;#
GFED@ABCS1Λ;Λ;Λ
//
b;Λ;#
ONMLHIJKS+2
c;#;Λ
Applying the same algorithm as we used for NFAΛ → FA:
ONMLHIJKS
−
+0
a;Λ;#//
b;Λ;#
��
c;#;Λ
��
ONMLHIJKS+012
a;Λ;#
b;Λ;#
//
c;#;Λ
��~~~~
~~~~
~~
ONMLHIJKS+12
b;Λ;#pp
c;#;Λ
wwnnnnnnnnnnnnnnnnnnn
ONMLHIJKS+2
c;#;Λ//
The resulting PDA is deterministic, and generates L6.However, this does not happen with L3 = {wwR}:
ONMLHIJKS−0 Λ;Λ;Λ
//
a;Λ;a
b;Λ;b
MM
ONMLHIJKS+1
a;a;Λ
b;b;Λ
MM
−→ WVUTPQRSS−,+0
a;Λ;a
!!a;a;Λ++
b;Λ;b
33
b;b;Λ
==
a;Λ;a
b;Λ;b
MM
ONMLHIJKS+1
a;a;Λ
b;b;Λ
MM
ւ
ONMLHIJKS−+0
a;Λ;a
b;Λ;b,,
a;a;Λb;b;Λ !!B
BBBB
BBBB
BONMLHIJKS+
01
a;a;Λb;b;Λ~~}}
}}}}
}}}
a;Λ;ab;Λ;b
ss
ONMLHIJKS+1
a;a;Λ77
b;b;Λgg
18.5. CFG ⊆ PDA 105
This language cannot be parsed deterministically.
18.5 CFG ⊆ PDA
Every language generated by a CFG may be accepted by a PDA.
The proof is by construction. We will in fact show how to build two differentPDAs (corresponding to top-down and bottom-up parsers) for every CFG.
For both constructions, we suppose we have a CFG G = (Σ, N, S, P ) and wewill construct a PDA P = (Σ,Γ, Q, q0, F, δ).
18.6 Top-Down construction
Σ = Σ Γ = N ∪ Σ Q = {0, 1} q0 = 0 F = {1}
• δ(0,Λ,Λ) 7→ (1, S)
• For each x ∈ Σ, δ(1, x, x) 7→ (1,Λ) (match)
• For each (X → α) ∈ P , δ(1,Λ, X) 7→ (1, α) (expand)
GFED@ABC0−Λ;Λ;S
// GFED@ABC1+
x;x;Λ (match)
��
Λ;X;α (expand)
RR
18.7 S → aS | T T → b | bT
GFED@ABC0−Λ;Λ;S
// _^]\XYZ[1+
a;a;Λ
��
b;b;Λ
��
Λ;S;aS 11
Λ;S;T
II
Λ;T ;b
TT Λ;T ;bT
hh
state input stack (top to left)0 abb Λ1 abb S expand1 abb aS match1 bb S expand1 bb T expand1 bb bT match1 b T expand1 b b match1 Λ Λ accept
106 CHAPTER 18. PUSHDOWN AUTOMATA
18.8 Bottom-Up construction
Σ = Σ Γ = N ∪ Σ Q ⊇ {qp, qf} q0 = qp F = {qf}
• For each x ∈ Σ, δ(qp, x,Λ) 7→ (qp, x) (shift)
• For each (X → α) ∈ P , where α = α0, . . . , αn:
– create new states {q1, . . . , qn};
– δ(qp,Λ, αn) 7→ (qn,Λ)
– δ(qn,Λ, αn−1) 7→ (qn−1,Λ)
– · · ·
– δ(q1,Λ, α0) 7→ (qp, X) (reduce)
• δ(qp,Λ, S) 7→ (qf ,Λ)
18.9 S → aS | T ; T → b | bT
ONMLHIJKq+f
�� ���Λ;a;S
11_^]\XYZ[q−p
a;Λ;a
��
b;Λ;b
Λ;S;Λww
Λ;T ;S
GGΛ;T ;Λ
55
Λ;b;T
QQ
Λ;S;Λ
??����������
�� ���Λ;b;Tqq
state input stack (top to left)qp abb Λ shiftqp bb a shiftqp b ba shiftqp Λ bba reduceqp Λ Tba reduceqp Λ Ta reduceqp Λ Sa reduceqp Λ S acceptqf Λ Λ
18.10 PDA ⊆ CFG
Every language accepted by a PDA may be generated by a CFG.We must show how to construct a CFG G = (Σ, N, S, P ) from an arbitrary
PDA P = (Σ,Γ, Q, q0, F, δ).To make the construction simpler, we suppose:
• |F | = 1, i.e. P has only one accepting state. If |F | > 1, add new statesq′, qf /∈ Q, put F = {qf}, for some new stack symbol y′ /∈ Γ, and for each
q ∈ F , add transitions ?>=<89:;q Λ;Λ;y′
//GFED@ABCq′Λ;y′;Λ
//GFED@ABCqf .
• Every transition either pushes one stack symbol, or pops one stack symbol,but not both.
18.11. PDA TO CFG FORMALLY 107
– Replace any transition ?>=<89:;q1 x;y;z// ?>=<89:;q3 that has both y and z not Λ
(x ∈ Σ ∪ {Λ}) by the transitions ?>=<89:;q1 x;y;Λ// ?>=<89:;q2 Λ;Λ;z
// ?>=<89:;q3 for some
new state q2 /∈ Q.
– Replace any transition ?>=<89:;q1 x;Λ;Λ// ?>=<89:;q3 by the transitions ?>=<89:;q1 x;y′;Λ
// ?>=<89:;q2 Λ;Λ;y′
// ?>=<89:;q3for some new state q2 /∈ Q and stack symbol y′ /∈ Γ.
Each nonterminal Apq in the CFG represents a sequence of transitions fromstate p to state q, with no net change to the stack.
Note that the first transition in the sequence must be a push, and the last
must be a pop: ?>=<89:;p x;Λ;y//?>=<89:;r ?>=<89:;s x′;z;Λ
//?>=<89:;q .
Case 1: y = zPut Apq → xArsx
′.
Case 2: y 6= z There must be some intermediate transition which pops the y
pushed by the first transition: ?>=<89:;p x;Λ;y//?>=<89:;r ?>=<89:;r′ x′′;y;Λ
// ?>=<89:;s′ ?>=<89:;s x′;z;Λ//?>=<89:;q .
Put Apq → Aps′As′q.
18.11 PDA to CFG Formally
We transform P = (Σ,Γ, Q, q0, {qf}, δ) to G = (Σ, N, S, P ).Let N = Q × Q: nonterminals of the grammar are pairs of states of the
automaton. For convenience, we write Apq for the pair (p, q).Let S = Aq0qf
.
1. For each p ∈ Q, put App → Λ in P .
2. For each p, q, r ∈ Q, put Apq → AprArq in P .
3. For each p, q, r, s ∈ Q, y ∈ Γ, and x, x′ ∈ Σ ∪ {Λ},
if δ contains transitions ?>=<89:;p x;Λ;y//?>=<89:;r and ?>=<89:;s x′;y;Λ
//?>=<89:;q ,
put Apq → xArsx′ in P .
18.12 Example (L6 from slide 69)
GFED@ABC0−Λ;Λ;Λ
//
a;Λ;#
?>=<89:;1Λ;Λ;Λ
//
b;Λ;#
GFED@ABC2+
c;#;Λ
• We have just one final state.
108 CHAPTER 18. PUSHDOWN AUTOMATA
• The Λ; Λ; Λ transitions are not allowed. Choose $ /∈ Γ, and put instead:
GFED@ABC0−Λ;Λ;$
//
a;Λ;#
?>=<89:;3Λ;$;Λ
//?>=<89:;1Λ;Λ;$
//
b;Λ;#
?>=<89:;4Λ;$;Λ
// GFED@ABC2+
c;#;Λ
1. A00 → Λ A11 → Λ A22 → Λ A33 → Λ A44 → Λ
2. A02 → A01A12
A03 → A01A13 | A02A23
A04 → A01A14 | A02A24 | A03A34
(N.B. should also have e.g. A01 → A03A31, but can see from the shape ofthe PDA that this will be useless.)
3. (a) Consider the transition ?>=<89:;0
a;Λ;#
. Its “pair” is ?>=<89:;2
c;#;Λ
.
So: A02 → aA02c.
(b) Consider the transition ?>=<89:;1
b;Λ;#
. Its “pair” is ?>=<89:;2
c;#;Λ
.
So: A12 → bA12c.
(c) Consider the transition ?>=<89:;0Λ;Λ;$
//?>=<89:;3 . Its “pair” is ?>=<89:;3Λ;$;Λ
//?>=<89:;1 . So:
A01 → ΛA33Λ, i.e. A01 → A33.
(d) Consider the transition ?>=<89:;1Λ;Λ;$
//?>=<89:;4 . Its “pair” is ?>=<89:;4Λ;$;Λ
//?>=<89:;2 . So:
A12 → ΛA44Λ, i.e. A12 → A44.
Summarising:A02 → A01A12 | aA02cA01 → A33
A12 → bA12c | A44
A33 → ΛA44 → Λ
and simplifying (A02 becomes S, A12 becomes T , everything else is Λ):
S → T | aScT → bT c | Λ
which indeed generates the language {ambncm+n | m,n ≥ 0}.
Chapter 19
Non-CF Languages
19.1 Not All Languages are Context Free
We know Reg ⊂ CFL: all regular languages are context free, but there arecontext free languages that are not regular.
Are there languages that are not context free?Yes, there are many such languages!The following languages cannot be generated by any context free grammar,
nor can they be recognized by any push-down automaton.{anbncn | n ≥ 0} (would need two counters){ww | w ∈ {a, b}∗} (would need queue, not stack){next week’s lotto numbers} (would need a miracle)
Many of the constraints on programming languages cannot be expressed(easily) using CFGs. For example:
• all the identifiers in a list of declarations must be distinct• identifiers must be declared before they are used• procedure calls must have arguments consistent with their declarations.
These are constraints on the context in which a particular piece of otherwisecontext-free syntax may occur. Approaches to dealing with them include:
ad hoc approaches: write a CFG to define the CF parts of the language(sometimes called a covering grammar), build a parser, then augmentthe parser with code to check the context constraints;
attribute grammars: CFGs annotated to show the extra relationships be-tween nonterminals;
context-sensitive grammars.
19.2 Context Sensitive Grammars
A phrase-structure grammar is a structure G = (Σ, N, S, P ), where Σ, N ,and S are as we have seen already.
109
110 CHAPTER 19. NON-CF LANGUAGES
Different classes of languages arise by placing different constraints on theproductions, P .
Let X,Y ∈ N ; φ ∈ Σ∗; and α, β ∈ (Σ ∪N)∗.
Regular grammar: X → φY or X → φ
Context free grammar: X → β
Context sensitive grammar: α→ β, where |α| ≤ |β|.
For example, a CSG may include a production abSd → abcTd, which saysin effect that S → cT , but only if it is preceded by ab and followed by d.
19.3 Example (1)
The following CSG generates the language described by the regular expression(a+b)∗(ac+bd).
S → aSS → bSaS → aTbS → bUT → cU → d
19.4 Example (2)
The following CSG generates the language {wcw | w ∈ {a, b}∗}.
S → c S → aTS S → bUSTa→ aT T b→ bT T c→ caUa→ aU Ub→ bU Uc→ cb
19.5 Generating the empty string
Our definition for CSG has productions α → β where |α| ≤ |β|. This doesn’tpermit Λ productions (|β| = 0), so we must also allow S → Λ if the language tobe generated contains Λ.
The following CSG generates the language {anbncn | n ≥ 0}.
S → Λ | aT bcaT b→ aaT bbU | abUb→ bUUc→ cc
19.6. CFL ⊂ CSL 111
19.6 CFL ⊂ CSL
We just saw an example of a language (anbncn) that is context sensitive but notcontext free (proof: pumping lemma for context free languages – not covered inthis course).
To show CFL ⊂ CSL, we need only show that every context free languageis context sensitive.
Let G = (Σ, N, S, P ) be a CFG for a language L. We will construct a CSGG′ = (Σ, N, S, P ′).
Without loss of generality, suppose P has no Λ productions, except perhapson S.
For each production X → β in P , put α→ β (where α = X) in P ′.Now every production has the form α→ β where α, β ∈ (Σ ∪N)∗,
|α| = 1, and (with perhaps one permitted exception), |β| ≥ 1,so G′ is a CSG.
112 CHAPTER 19. NON-CF LANGUAGES
Chapter 20
Closure Properties
20.1 Closure Properties
We now turn our attention to closure properties.Recall that regular languages are closed under union, concatenation, Kleene
closure, complementation, and intersection. That is, if L1 and L2 are regular:
• L1 ∪ L2 is regular;
• L1L2 is regular;
• L∗1 is regular;
• L1 is regular; and
• L1 ∩ L2 is regular.
Do the corresponding closure properties hold for context-free languages?o
20.2 Union of Context Free Languages is Con-text Free
Theorem If L1 and L2 are context free languages, their union L1 ∪ L2 is alsocontext free.
Without loss of generality, let N1 ∩N2 = ∅ (if not, systematically renameall nonterminals in one of the grammars). Also, let S 6∈ N1∪N2 be a freshnonterminal symbol.
G3 = (Σ1 ∪Σ2, N1 ∪N2 ∪ {S}, S, P1 ∪P2 ∪ {S → S1, S → S2}) is aCFG for L1 ∪ L2.
113
114 CHAPTER 20. CLOSURE PROPERTIES
Example
Language(G1) = {ambm} Language(G2) = {bncn}S → aSb S → bScS → Λ S → Λ
First, rename the nonterminals of G1 and G2 apart, by adding subscripts:
Now form the union as above; the resulting grammar:
S → S1 | S2 S1 → aS1b | Λ S2 → bS2c | Λ
is a CFG for {ambm} ∪ {bncn}.
Alternative proof (using PDAs) Let P1 = (Σ1,Γ1, Q1, q1, F1, δ1) and P2 =(Σ2,Γ2, Q2, q2, F2, δ2) be PDAs for L1 and L2 respectively.
Without loss of generality, let Q1 ∩ Q2 = ∅, and let q0 6∈ Q1 ∪ Q2 be afresh state.
P3 = (Σ1∪Σ2, Γ1∪Γ2, Q1∪Q2∪{q0}, q0, F1∪F2, δ3) is a PDAfor L1 ∪ L2, where δ3 is δ1 ∪ δ2 together with transitions δ3(q0,Λ,Λ) 7→(q1,Λ) and δ3(q0,Λ,Λ) 7→ (q2,Λ).
Example
P1 : GFED@ABCq−1 Λ;Λ;Λ//
a;Λ;#
GFED@ABCq+2
b;#;Λ
P2 : GFED@ABCq−3 Λ;Λ;Λ//
b;Λ;#
GFED@ABCq+4
c;#;Λ
P3 : ?>=<89:;q1Λ;Λ;Λ
//
a;Λ;#
GFED@ABCq+2
b;#;Λ
GFED@ABCq−0
Λ;Λ;Λ
??���������
Λ;Λ;Λ
��>>
>>>>
>>>
?>=<89:;q3Λ;Λ;Λ
//
b;Λ;#
GFED@ABCq+4
c;#;Λ
20.3 Concatenation of CF Languages is ContextFree
Theorem If L1 and L2 are context free languages, their concatenation L1 ⌢ L2
is also context free.
20.4. KLEENE STAR OF A CF LANGUAGE IS CF 115
Proof (using grammars) Let G1 and G2 be as before, with no shared non-terminals, and S a fresh nonterminal.
Theorem If L is a context free language, so is L∗.
Proof (using grammars) Let G = (Σ, N, S, P ), and let S′ 6∈ N be a freshnonterminal.
G′ = (Σ, N ∪ {S′}, S′, P ∪ {S′ → Λ, S′ → SS′}) is a CFG for L∗.
Example L = {ambm} L′ = {ambm}∗
G = ({a, b}, {S}, S, {S → aSb, S → Λ})G′ = ({a, b}, {S, S′}, S′,
{S′ → SS′, S′ → Λ, S → aSb, S → Λ})
Exercise Eliminate the Λ productions from G′.
20.5 Intersections and Complements of CF Lan-guages
Theorem The intersection L1 ∩L2 of CF languages L1 and L2 may be CF, orit may not.
Proof We already know that intersections of regular languages are regular, soif L1 and L2 are regular (and hence CF), L1∩L2 is also regular (and henceCF).
However, if we take L1 = {anbncm} and L2 = {anbmcm}, which are bothCF (see Cohen, p385), the intersection L3 = {anbncn}, which we havealready seen is not CF.
Theorem The complement L′ of a CF language L may or may not be CF.
Proof Suppose L1 and L2 are CF; we know L1 ∩ L2 may be non-CF.
However, if complements were CF, (L′1 ∪ L′
2)′ would be CF: contradic-
tion!
116 CHAPTER 20. CLOSURE PROPERTIES
Chapter 21
Summary of CF Languages
21.1 Why context-free?
• There are non-regular languages
– Proof: the pumping lemma
• Sentences of many languages have a natural nested or recursive structure
– contrast with regular languages, whose structure is essentially se-quence/selection/repetition
• the nested structure can be exposed by writing context free grammars,and by drawing parse trees
– terminals appear in sentences, and at the leaves of parse trees
– nonterminals name categories of fragments of sentences, and on theinterior nodes of parse trees.
– productions describe how the nonterminals relate to one another, andto the terminals.
21.2 Phrase-structure grammars
A phrase-structure grammar G = (Σ, N, S, P ) where:
• Σ is a finite set of terminals;
• N is a finite set of nonterminals, Σ ∩N = ∅;
• S is the start symbol, S ∈ N ; and
• P is a set of productions.
A string (or sentence) is a sequence of terminals Σ∗.A sentential form is a sequence of terminals and/or nonterminals (Σ∪N)∗.A production is a pair of sentential forms, written α→ β (α, β ∈ (Σ∪N)∗).
117
118 CHAPTER 21. SUMMARY OF CF LANGUAGES
21.3 Special cases of Phrase-structure grammars
• A Context Free Grammar (CFG) is a phrase-structure grammar in whichevery production α → β has α ∈ N . That is, the left-hand side of everyproduction is a single nonterminal.
We call the class of languages generated by CFGs context free languages
(CFL).
• A Regular Grammar (RG) is a CFG with the additional property that, forevery production α→ β, β ∈ Σ∗ or β ∈ Σ∗N . That is, the right-hand sideof every production is a sequence (perhaps empty) of terminals, optionallyfollowed by a single nonterminal.
The class of languages generated by RGs is the same as the class acceptedby finite automata, so by Kleene’s theorem is the same as the class ofregular languages.
Every RG is a CFG, so every regular language is a context free language:Reg ⊆ CFL.
• A Context Sensitive Grammar (CSG) is a phrase-structure grammar inwhich every production α → β has either |α| ≤ |β|, or α = S and β = Λ.That is, the left- and right-hand sides are arbitrary sentential forms, withthe only restriction being that the right-hand side is no shorter than theleft-hand side, except that there may be a single Λ production for the startsymbol. They are sometimes called non-reducing grammars.
We call the class of languages generated by CSGs context sensitive lan-
guages (CSL).
Every CFG may be transformed to an equivalent CFG whose only Λ pro-duction is on the start symbol, so every CFG is equivalent to a CSG.Hence, every context-free language is a context sensitive language: CFL ⊆CSL.
21.4 Derivations
Given G = (Σ, N, S, P ) and w ∈ Σ∗, a derivation in G of w is a sequenceα0 . . . αn of sentential forms, where α0 = S, alphan = w, and for each 0 ≤ i < n,αi+1 is derived from αi by replacing some nonterminal n in αi by β, where(n→ β) ∈ P .
G generates w iff there is a derivation in G of w.A leftmost derivation is one in which the leftmost nonterminal of αi is always
replaced.If there is a derivation in G of w, there is a leftmost derivation.String w is ambiguous for G if there is more than one leftmost derivation in
G of w.G is ambiguous if at least one w is ambiguous for G. There may or may not
be an equivalent unambiguous grammar.
21.5. PARSING 119
21.5 Parsing
Parsing is the process of finding derivations.The derivation may be summarised by a parse tree.Parsing in regular grammars essentially simulates the operation of a NFA.Parsing in arbitrary grammars is, in general, more difficult.
21.6 Recursive descent and LL(1) grammars
An LL(1) grammar is a CFG that may be parsed top-down and deterministically.A grammar is LL(1) if every nonterminal satisfies the two requirements.If a grammar is LL(1), there may or may not be an equivalent LL(1) gram-
mar.A recursive descent parser is a “one-off” recogniser for an LL(1) grammar.
21.7 Pushdown automata
A PDA P = (Σ,Γ, Q, q0, F, δ) is an automaton that can accept a string w ifthere is a path from the initial state q0 to some final state in F such that:
• the stack is empty initially and finally;
• the sequence of read symbols along the path is w;
• each pop symbol along the path matches the symbol currently at the topof the stack.
The class of languages accepted by PDAs is exactly same as the class oflanguages generated by CFGs (i.e., the context-free languages).
A PDA is deterministic (DPDA) if, for every q ∈ Q, x ∈ Σ, and y ∈ Γ, thereis at most one enabled transition.
There may or may not be a DPDA that accepts some CFL.
21.8 Closure
• There are non-CF languages: for example, {anbncn | n ≥ 0}.
– Proof : the pumping lemma for CF languages (NOT DONE).
• The class of CFLs is closed under:
– union
– concatenation
– Kleene closure
• The class of CFLs is not closed under:
120 CHAPTER 21. SUMMARY OF CF LANGUAGES
– intersection
– complementation
21.9 Constructions on CFLs
• FA to regular grammar
• Regular grammar to FA (NOT DONE)
• Ambiguous to unambiguous grammar (perhaps)
• Remove Lambda productions
• Remove unit productions
• CFG to LL(1) form (perhaps)
• CFG to PDA (top-down and bottom-up)
• PDA to CFG
Part III
Turing Machines
121
Chapter 22
Turing Machines I
22.1 Introduction
So far in COMP 202 you have seen:
• finite automata
• pushdown automata
In this part of COMP 202 we will look at Turing machines.Turing machines are named after the English logician Alan Turing. They
were introduced in 1936 in his paper ‘On computable numbers with an applica-tion to the Entscheidungsproblem.’
We can think of finite automata, pushdown automata and Turing machinesas models of computing devices. A finite automaton has
• a finite set of states
• no memory.
A pushdown automaton has
• a finite set of states
• unlimited memory with restricted access.
A Turing machine has
• a finite set of states
• unlimited memory with unrestricted access
Dates:
• Finite automata were first described in 1943
• Pushdown automata were first described in 1961
• Turing machines were first described in 1936
123
124 CHAPTER 22. TURING MACHINES I
22.2 Motivation
While thinking of finite automata, pushdown automata, and Turing machinesas machines of increasing power is quite useful it does not give any insight intowhy or how Turing invented his abstract machines. This story is worth telling.
Nowadays, it is impossible to say exactly how many computing devices thereare in the world, we can only give some vague estimate in terms of hundreds ofthousands, or millions. We can say with great precision how many computersthere were in the world in the early 1930’s: none.
How then did Turing come to invent Turing machines?
22.3 A crisis in the foundations of mathematics
In the beginning of the 20th century mathematics was facing a crisis in itsfoundations.
David Hilbert was the leading mathematician of the early 1900s. He believedthat mathematics would escape intact from the crisis it faced.
Hilbert believed that mathematics was decidable: that is, for every mathe-matical problem, there is an algorithm which either solves it or shows that nosolution is possible.
The problem of showing that mathematics was decidable came to be knownas the Entscheidungsproblem: German for ‘decision problem’.
In order to make progress on this problem it was necessary to get a cleareridea about
• algorithms, and
• the class of computatble functions.
22.4 The computable functions: Turing’s approach
The question that Alan Turing was really trying to investigate was “What classof functions can be computed by a person?”
In the 1930’s a ‘computer’ was a person who performed calculations. Thecalculations must be algorithmic in nature: that is they must be the sorts ofthing one could, in principle, build a machine to perform.
In ‘On computable numbers with an application to the Entscheidungsprob-
lem’, Turing imagines what actions a computer could perform, and tries toabstract away the details.
22.5 What a computer does
Imagine a person sitting at a desk performing calculations on paper.The person can:
• read the symbols that have been written on the paper,
22.6. WHAT HAVE WE ACHIEVED? 125
• write symbols on the paper,
• erase what has been written, and
• perform actions dependent on what symbols were read.
We abstract away some of the details.We assume that the paper is in the form of a tape of individual squares, each
of which can either be blank or hold just one symbol.The computer can focus attention on only one cell of the tape at a time.We assume that the tape is not limited.The actions that the abstract computer can perform are then
• reading a symbol,
• writing a symbol,
• erasing a symbol, and
• focussing attention on the next or the previous cell.
We can give a formal description of these abstract machines.Turing then asserts that these actions are the sorts of things one could, in
principle, build a machine to perform.Turing did, in fact, become involved with the early efforts to build real,
physical machines, but in the 1930’s his focus was on purely abstract machines.
22.6 What have we achieved?
Now we have a formal model of an abstract computer we can study the func-tions that it can compute. We can call such functions “the Turing machinecomputable functions”.
There is no certainty that the Turing machine computable functions are alland only the computable functions, because we may have made a mistake whenanalysing the actions of the computer.
22.7 The computable functions: Church’s ap-proach
At the same time that Turing was thinking about abstract machines the Ameri-can logician Alonzo Church was taking a different approach to defining the classof computable functions.
Church had developed a notation for describing functions, called the λ-calculus. The language of the λ-calculus is very simple. A term of the λ-calculusis:
126 CHAPTER 22. TURING MACHINES I
• a variable, or
• an application of two λ-terms, or
• the abstraction of a variable over a λ-term
More formally:
TERM → VAR
| TERM TERM
| λ VAR . TERM
VAR → x1, x2, x3, . . .
We have one rule, called β-reduction:
(λx.M)N →β [N/x]M
where [N/x]M is read as “substitute N for x in M”.Substitution is algorithmic. Now, the λ-calculus looks nothing like Turing
machines, and it was constructed on a completely different basis.Nonetheless, the λ-calculus lets us define functions. We can call such func-
tions “the λ-calculus computable functions”.Just as before, there is no certainty that the λ-calculus computable functions
are all and only the computable functions.
22.8 First remarkable fact
Although the λ-calculus and Turing’s abstract machines look completely differ-ent it turns out that they both define exactly the same class of functions.
So the λ-calculus computable functions are just the same functions as theTuring machine computable functions.
Many other approaches to defining the class of computable functions havebeen proposed, e.g.
• the µ-recursive functions
• Post systems
• unlimited register machines (URMs)
• Minsky systems
• . . .
All have been shown to define exactly the same class of functions.Furthermore no-one has come up with a function which is obviously com-
putable and which is not in this class.
22.9. CHURCH-TURING THESIS 127
22.9 Church-Turing thesis
The assertion that the class of Turing machine computable functions is the classof computable functions is called the Church-Turing thesis.
The Church-Turing thesis not not something that can be formally proven.This is not because we are stupid, or lack cunning, but because it relates ourinformal notion of computable with a formal system.
It is, of course, possible to give a formal proof that the λ-calculus computablefunctions are just the same functions as the Turing machine computable func-tions.
22.10 Second important fact
Every Turing machine embodies an algorithm. We can think of the initialconfiguration of the tape for a Turing machine as the data (or input) which themachine is supplied with.
• We can describe one Turing machine to another by using symbols on a
tape.
• We can construct a Turing machine TU which takes as input a descriptionof any Turing machine T1, and behaves just like T1.
22.11 The universal machine
A machine like TU is called a universal Turing machine.
A universal Turing machine can be made to behave like any Turing machine,just by supplying it with appropriate data.
The existence of universal Turing machines is quite remarkable.
Physical calculating machines had been constructed prior to the 1930’s butthese were all special purpose machines.
Turing had shown that special purpose machines are pointless:
• if you want a machine to add up tables of financial data, build a universalmachine and then describe the appropriate special machine to it;
• if you want a machine to find numeric solutions to differential equations,build a universal machine and then describe the appropriate special ma-chine to it;
• if you want a machine to play music backwards, build a universal machineand then describe the appropriate special machine to it;
• if you want a machine to do anything (algorithmic): build a universalmachine and then describe the appropriate special machine to it.
128 CHAPTER 22. TURING MACHINES I
In modern parlance we call a universal Turing machine “a computer”, and wecall the process of supplying it with appropriate data to mimic another Turingmachine “programming”.
This is the sense in which a computer is a general purpose machine.Of course in 1936 it was not possible to build a practical physical approxi-
mation to a universal Turing machine.
22.12 Third important fact
We have still not seen what Turing machines have to do with the Entschei-
dungsproblem.We can use the idea of a universal Turing machine to show that there are
indeed undecidable problems in mathematics.Because a universal Turing machine can be used to encode any Turing ma-
chine we can use universal machines to ask questions about Turing machinesthemselves. We can use this technique to construct a purely mathematicalproblem which is undecidable.
Chapter 23
Turing machines II
23.1 Introduction
In the last lecture we looked at how Turing machines came to be developed, andgave an informal description of
• the Church-Turing thesis
• Universal Turing machines
• formally undecidable problems
Now we will proceed with a formal development of the theory of Turingmachines.
Recall that we stated that we can think of finite automata, pushdown au-tomata and Turing machines as models of computing devices. A finite automa-ton has
• a finite set of states
• no memory.
A pushdown automaton has
• a finite set of states
• unlimited memory with restricted access.
A Turing machine has
• a finite set of states
• unlimited memory with unrestricted access
129
130 CHAPTER 23. TURING MACHINES II
23.2 Informal description
Recall that our informal description of a Turing machine was that there was:
• an infinite tape
• a head, which can:
– read a symbol
– write a symbol
– erase a symbol
– move left
– move right
23.3 How Turing machines behave: a trichotomy
Consider the following program (adapted from Cohen):
read x;
if x < 0 then halt;
if x = 0 then x := 1/x;
while x > 0 do x := x + 1
This program can do 3 things:
• it can halt;
• it can crash;
• it can run forever.
Turing machines exhibit the same behaviour. They can:
• halt;
• crash;
• run forever.
23.4 Informal example
Suppose we want to define a Turing machine to accept the language
{w#w|w ∈ {0, 1}∗}
Our input string will be presented to us on a tape and initially the head islocated at the leftmost end of the tape.
For example the input might be 10011#10011.How could we decide whether the string was in the language?Clearly the head is going to have to move back and forth along the string.
23.5. TOWARDS A FORMAL DEFINITION 131
Let’s see how we might go about this.What we do first depends on whether we are looking at a 0, 1 or a #.If we are looking at a # then we must move right and check that the next
cell is empty.If is is a 0 or a 1 then what should we do?
• we should mark the cell as visited
• we should go off and find the #
• then we should find a 0 or a 1 as appropriate in the next cell
• then we should mark this cell
• and then we should repeat this until we are finished
We can mark a cell by writing a new symbol, say x in it.But of course now we have a tape with 0’s, 1’s, #’s and x’s in it so our
method will have to change a bit.
23.5 Towards a formal definition
The formal definition of a Turing machine follows a similar pattern to the formaldefinitions that we have given of finite automata and pushdown automata.
Different textbooks give slightly different definitions. For example
• John Martin’s ‘Introduction to languages and the theory of computation’defines a Turing machine as a 5-tuple,
• Michael Sipser’s ‘Introduction to the theory of computation’ defines aTuring machine as a 7-tuple, and
• Daniel Cohen’s ‘Introduction to computer theory’ splits the difference andsays we have to give 6 things to define a Turing machine.
So, care must be taken when reading from more than one source. We shalluse Cohen’s definition, with slight adaptations.
23.6 Alphabets
In a Turing machine we need two alphabets,
• Σ, the input alphabet
• Γ, the tape alphabet
We use ∆ for the blank symbol, much as we use Λ for the empty string and ∅for the empty language.
Cohen stipulates that ∆ 6∈ Σ and ∆ 6∈ Γ.Often we will have Σ ⊂ Γ.
132 CHAPTER 23. TURING MACHINES II
23.7 The head and the tape
We have an infinite tape, and a head which is located at one of the cells.
We supply our Turing machine with input w = w1w2 . . . wn−1wn by enteringthe symbols w1, w2 . . . wn−1, wn in the first n cells of the tape.
All the other cells in the tape initially have ∆ in them.
Initially the head is located at the first cell in the tape.
Because we can write on the tape we can use it as a memory.
23.8 The states
We have a finite set of states: Q.
One of the states is the start state: q0 ∈ Q.
Some subset of the states are the halt states: F ⊆ Q.
23.9 The transition function
We have a transition function, δ, which depends on:
• which state we are in
• what symbol is in the cell the head is at
and which can tell us
• what state to go to
• what symbol to write in the cell the head is at
• whether to move the head left or right.
23.10 Configuration
So, as we compute we (usually) move through the states of the machine, and(usually) the head moves along the tape.
We can represent the configuration of the machine as a triple, constisting of:
• the state the machine is in
• the contents of the tape
• the location of the head
23.11. FORMAL DEFINITION 133
23.11 Formal definition
Definition 12 (Turing machine) A Turing machine is a 6-tuple (Q,Σ,Γ, δ, q0, F )where:
• Q is a finite set: the states
• Σ is a finite set: the input alphabet
• Γ is a finite set: the tape alphabet
• δ is a function from Q × (Σ ∪ Γ ∪ {∆}) to Q × (Γ ∪ {∆}) × {L,R}: the
transition function
• q0 ∈ Q: the start state
• F ⊆ Q: the final or accepting states.
23.12 Representing the computation
We represent the configuration of a Turing machine like:
sσ1 . . . σk . . . σn
where:
• s is the state the machine is in
• σ1 . . . σk . . . σn is the ‘meaningful’ part of the tape
• σk is the symbol about to be read
For example, if the start state is S1, and the input is babba, then the initialconfiguration will be:
S1
babba
We can use a sequence of configurations to trace the computation that themachine performs.
23.13 A simple machine
Let M1 = (Q,Σ,Γ, δ, q0, F ), whereQ = {S1, S2, S3, S4}Σ = {a, b} Γ = {a, b} q0 = S1 F = {S4}
and δ is given by the table:
State Reading State Writing MovingS1 a S2 a RS1 b S2 b RS2 b S3 b RS3 a S3 a RS3 b S3 b RS3 ∆ S4 ∆ R
134 CHAPTER 23. TURING MACHINES II
23.14 Graphical representation of M1
GFED@ABC1−a,a,R
**
b,b,R
44?>=<89:;2b,b,R
//?>=<89:;3
a,a,R
b,b,R
UU
∆,∆,R// GFED@ABC4+
• Every time we read a symbol we move one step to the right along the tape
• Writing the symbol we have just read leaves the cell unaffected by ourvisit.
23.15 Some traces
If we give this machine abb we will get:
S1
abb →S2
abb →S3
abb →S3
abb∆ →S4
abb∆∆
If we give this machine bab we will get:
S1
bab →S2
bab crash
A little thought should show that this machine accepts Language((a + b)b(a +b)∗)
We should not be surprised that this machine accepts a regular languageas it just traversed the input string from left to right and did not change thecontents of the tape.
23.16 Turing machines can accept regular lan-
guages
This is a general property, which we can express as a theorem.
Theorem 15 Every regular language is can be accepted by a Turing machine.
The proof consists of taking an FA which accepts a regular language and turningit into a Turing machine which accepts the same language. Basically, we add anew halt state and adjust the transition function.
23.17 Proof
Let L be a regular language. Then L is accepted by an FA ML = (Q,Σ, δ, q0, F ).Then TL = (Q′,Σ′,Γ, δ′, q′0, F
′) where:
23.18. ANOTHER MACHINE 135
• F ′ = {SHALT}
• Q′ = Q ∪ F ′
• Σ′ = Σ
• Γ = Σ
• if δ(s, σ) is defined then δ′(s, σ) = (δ(s, σ), σ, R)
• ∀f ∈ F δ′(f,∆) = (SHALT,∆, R)
is a Turing machine which accepts L.
23.18 Another machineLet M2 = (Q,Σ,Γ, δ, q0, F ), where Q = {S1, S2, S3, S4, S5, S6}, Σ = {a, b},Γ = {a, A,B}, q0 = S1, F = {S6}, and δ is given by:
State Reading State Writing MovingS1 a S2 A RS2 a S2 a RS2 B S2 B RS2 b S3 B LS3 B S3 B LS3 A S5 A RS3 a S4 a LS4 a S4 a LS4 A S1 A RS5 B S5 B RS5 ∆ S6 ∆ R
23.19 Graphical representation of M2
GFED@ABC1−a,A,R
//?>=<89:;2
a,a,R
B,B,R
UU
b,B,L//?>=<89:;3
B,B,L
A,A,R
//
a,a,L}}zz
zzzz
zzzz
?>=<89:;5
B,B,R
∆,∆,R
// GFED@ABC6+
?>=<89:;4
a,a,L
UU
A,A,R
bbFFFFFFFFFF
This machine accepts the language anbn, n > 0.
This machine does write on the tape, and the head moves both left andright.
136 CHAPTER 23. TURING MACHINES II
23.20 A trace
If we give this machine aabb we will get:
S1
aabb →S2
Aabb →S2
Aabb →S3
AaBb →S4
AaBb →S1
AaBb →S2
AABb →S2
AABb →S3
AABB →S3
AABB →S5
AABB →S5
AABB →S5
AABB∆ →S6
AABB∆∆
To get a clearer idea of what is going on here try tracing the computationon a longer string like aaaabbbb.
Chapter 24
Turing Machines III
24.1 An example machine
Consider the following machine T1 = (Q,Σ,Γ, δ, q0, F ), where Q = {S1, S2, S3}Σ = {a, b} Γ = {a, b} q0 = S1 F = {S3} and δ is given by the table:State Reading State Writing MovingS1 ∆ S1 ∆ RS1 b S1 b RS1 a S2 a RS2 a S3 a RS2 b S1 b R
GFED@ABC1−
∆,∆,R
b,b,R
TT
a,a,R**?>=<89:;2
b,b,R
kka,a,R
// GFED@ABC3+
Now, T1 behaves as follows:
• if the string contains a aa we reach the halt state, so the string is accepted
by the TM
• if the string does not contain a aa and ends in an a then the machinecrashes
• if the string does not contain a aa and ends in a b then the machine loops
24.2 Some definitions
Any Turing machine can exhibit this trichotomy, so, for every Turing machineT we define
137
138 CHAPTER 24. TURING MACHINES III
• accept(T ) to be the set of strings on which T halts
• reject(T ) to be the set of strings on which T crashes
• loop(T ) to be the set of strings on which T loops
24.3 Computable and computably enumerablelanguages
Definition 13 (Computable language) A language L is computable if there
is some Turing machine T such that:
1. accept(T ) = L
2. loop(T ) = ∅
3. reject(T ) = L
Condition 2 tells us that the Turing machine must either halt gracefully orcrash: it cannot go on forever.
Definition 14 (Computably enumerable language) A language L is com-
putably enumerable if there is some Turing machine T such that:
1. accept(T ) = L
2. loop(T ) ∪ reject(T ) = L
Some authors (Turing and Cohen included) use the terms “recursive” and“recursively enumerable”.
Theorem 16 Every computable language is computably enumerable
Proof Suppose L is computable. Then by definition there is a Turing ma-chine such that
accept(T ) = L
loop(T ) = ∅
reject(T ) = L
Now, loop(T ) ∪ reject(T ) = ∅ ∪ L = L.Hence every computable language is computably enumerable.
24.4 Deciders and recognizers
Informally, we can think of a computable language as being one for which we canwrite a decider, a program whose behaviour is sure to tell us whether a stringis in the language or not.
24.5. A DECIDABLE LANGUAGE 139
We can think of a computably enumerable language as one for which we canwrite a recognizer. A recognizer is a program which will tell us if a string is inthe language, but which may loop if the string is not in the language.
A language for which we can write a decider is called a decidable language.A language for which we can write a recognizer is called a semi-decidable
language.
24.5 A decidable language
Recall that a language is just a set of strings, so we can think of this problemin terms of whether membership of some arbitrary set is decidable.
As long as we can encode the elements of the set as strings, the set is alanguage.
Let B be a finite automaton, and w a string over the alphabet of B. Considerthe set:
AFA = {(B,w)|w ∈ Language(B)}
Is AFA decidable, i.e. is it decidable whether a string is in the language of afinite automaton?
Theorem 17 If B is a finite automaton, and w a string over the the alphabet
of B then {(B,w)|w ∈ Language(B)} is decidable.
Proof We already know that, for any FA B, there is a TM B′ that acceptsthe same language. By inspection of the construction, we can easily tell thatloop(B′) = ∅.
We must construct a Turing machine which takes as input a description ofB′ and the string w, and which halts if w ∈ accept, and crashes otherwise.
A universal TM can do this.
24.6 Another decidable language
If A is an FA, then the set:
{A|Language(A) = ∅}
is also decidable.We must write a program which takes (a description of) A and checks
whether any accepting state can be reached from the start state.
24.7 A corollary
Let Language(C) = (Language(A)∩Language(B))∪(Language(A)∩Language(B))Language(C) is the symmetric difference of Language(A) and Language(B).Since the set of regular languages is closed under complementing, intersection
and union, then Language(C) is regular if Language(A) and Language(B) are.
140 CHAPTER 24. TURING MACHINES III
Now, Language(C) = ∅ iff Language(A) = Language(B).So, in order to check whether two FA are equal we only have to write a
program which takes their descriptions, constructs the FA which accepts theirsymmetric difference and checks whether this is ∅.
24.8 Even more decidable sets
We have discussed FA and regular languages. What about the set:
APDA = {(P,w)|w ∈ Language(P )}
where P is a pushdown automaton, and w a word of the input alphabet ofP . Is this set decidable?
It turns out that this set also is decidable.This is why we like context-free grammars: we can be sure we can write
parsers for them. In fact, the proof that APDA is decidable allows us to constructa parser from any CFG.
24.9 An undecidable set
What about the set:
ATM = {(T,w)|w ∈ accept(T )}
where T is a Turing machine and w is a word over the input alphabet of T .This set is not decidable. We will see why later.
Chapter 25
Turing Machines IV
25.1 Introduction
We are looking at how we can construct an undecidable set. Recall:
Definition 15 (Computable language) A language L is computable if there
is some Turing machine T such that:
1. accept(T ) = L
2. loop(T ) = ∅
3. reject(T ) = L
Informally, we can think of a computable language as being one for whichwe can write a decider, a program whose behaviour is sure to tell us whether astring is in the language or not.
A language for which we can write a decider is called a decidable language.
Definition 16 (Computably enumerable language) A language L is com-
putably enumerable if there is some Turing machine T such that:
1. accept(T ) = L
2. loop(T ) ∪ reject(T ) = L
25.2 AFA is decidable
Let B be a finite automaton, and w a string over the the alphabet of B, then
AFA = {(B,w)|w ∈ Language(B)}
is decidable. We can construct a Turing machine which takes as input (adescription of) B and the string w, and which halts if w ∈ Language(B), andcrashes otherwise.
141
142 CHAPTER 25. TURING MACHINES IV
25.3 APDA is decidable
Let P be a pushdown automaton, and w a string over the the alphabet of P ,then
APDA = {(P,w)|w ∈ Language(P )}
is decidable. We can construct a Turing machine which takes as input (adescription of) P and the string w, and which halts if w ∈ Language(P ), andcrashes otherwise.
25.4 The halting problem
Let T be a Turing machine, and w a string over the the input alphabet of T ,then
ATM = {(T,w)|w ∈ accept(T )}
is not decidable. This problem is often called the halting problem, becauseit asks about the halting behaviour of Turing machines. ProofSince we are trying to show that ATM is not decidable we assume that ATM is
decidable and derive a contradiction.If ATM is decidable then there is a Turing machine H such that:
1. accept(H) = ATM
2. loop(H) = ∅
3. reject(H) = ATM
We can write this as:
w ∈ accept(T ) implies (T,w) ∈ accept(H)
w 6∈ accept(T ) implies (T,w) ∈ reject(H)
or we could treat H as a little program:
H(T,w)△
= if T (w) haltsthen “yes”else “no”
Now, suppose we define a new machine D, which accepts machines which do not
accept themselves as input.There is no reason why we cannot give a Turing machine itself as input, any
more than there is a reason why we cannot give a program itself as input.We can write D as a little program:
D(T )△
= if H(T, T ) = “yes”then crashelse halt
Now, what happens if we give D itself as input?
25.5. OTHER UNDECIDABLE PROBLEMS 143
D(D) = if H(D,D) = “yes”then crashelse halt
But our definition ofH requires that H(D,D) = “yes” precisely when D(D)halts.
So, D(D) = if D(D) halts then crash else halt.That is, D ∈ reject(D) if and only if D ∈ accept(D).Now, it is not possible for D ∈ accept(D) and D ∈ reject(D) both to hold,
so D ∈ loop(D).Hence, D(D) loops; therefore H(D,D) loops.This is the contradiction that we sought.Hence, if T is a Turing machine, and w a string over the the input alphabet
of T , then
ATM = {(T,w)|w ∈ accept(T )}
is not decidable.Our undecidable problem really turns on the existence of universal Turing
machines, i.e. on the fact that Turing machines are powerful enough to describethemselves and their own behaviour.
25.5 Other undecidable problems
Are there “real” problems which are undecidable, or are they merely mathe-matical curiosities?
First order logic It is not possible to write a theorem prover which, given alogical expression, is certain to be able to say whether the expression canbe proved or not.
It is possible to write a semi-decision procedure: given a provable expres-sion it is possible to say that it is provable.
Verification It is not possible to write a program which, given a specificationS and a program P , will determine whether the program P meets thespecification S.
Programming It is not possible to write a program which, given a specificationS, will construct a program P that meets it.
25.6 Closure properties
Theorem 18 The class of computable languages is closed under
1. union,
2. intersection, and
144 CHAPTER 25. TURING MACHINES IV
3. complementation.
(It is closed under concatenation and Kleene closure as well, but we won’tprove these results.)
Proof: Suppose L1 and L2 are computable. Then there exist Turing ma-
1. (Union) Let T3 be a TM that simulates T1 and T2 simultaneously. Forexample, it might perform steps of T1 and T2 in turn, on disjoint parts ofthe tape. Let T3 halt if T1 OR T2 halts; otherwise, both T1 and T2 crash,so T3 crashes.
3. (Complementation) Let T5 simulate T1. Let T5 crash if T1 halts, and haltif T1 crashes (one of these must eventually happen!).
accept(T5) = reject(T1) = L1
reject(T4) = accept(T1) = L1
loop(T4) = ∅
25.7 Computable and computably enumerablelanguages
Theorem 19 A language L is computable iff L is c.e. and L is c.e.
Proof: “If” is easy: L is computable, so by closure, L is computable. Allcomputable languages are c.e.
“Only if” is harder. Let T1 and T2 be TMs such that:
accept(T1) = L reject(T1) ∪ loop(T1) = Laccept(T2) = L reject(T2) ∪ loop(T2) = L
Construct a TM T that simulates T1 and T2 simultaneously. If T1 halts, Thalts. If T2 halts, T crashes. One of these must happen, since every string wbelongs either to L or to L.
Now that we have an undecidable language we can go further and define alanguage which is not even computably enumerable.
Theorem 20 ATM is not c.e.
Proof: ATM is c.e., by universal Turing machine.If ATM were also c.e. then ATM would be computable (Theorem 2).But ATM is not computable, so ATM is not c.e.
146 CHAPTER 25. TURING MACHINES IV
Chapter 26
Turing Machines V
26.1 A hierarchy of classes of language
We have now seen in COMP 202 a whole collection of classes of languages:
• all possible languages
• c.e. languages
• computable languages
• context-sensitive languages
• context-free languages
• regular languages
• finite languages
Each of these is a proper subset of the one above it.
26.2 A hierarchy of classes of grammar
In the 1950’s the linguist Noam Chomsky produced a hierarchy of classes ofgrammars, corresponding exactly to the four classes above:
Chomsky’s hierarchy is important, although it does not include every possibleclass of language. We can look for a finer structure than the one we havepresented (LL(1) grammars, for example), but this is outside the scope of thiscourse.
Chomsky was (is) a linguist, and he was really concerned with what sorts ofgrammars are required to describe natural languages, like English, Maori, Urdu,Swahili and so on.
Consider the following examples (adapted from Gazdar and Mellish’s Natural
Language processing in PROLOG):
• A doctor hired another doctor.
• A doctor whom a doctor hired hired another doctor.
• A doctor whom a doctor whom a doctor hired hired hired another doctor.
• A doctor whom a doctor whom a doctor whom a doctor hired hired hiredhired another doctor.
• . . .
This sentence is of the form:
• A doctor (whom a doctor)n (hired)n hired another doctor.
so it is context free but not regular. Are there any phenomena in Englishwhich require us to go beyond context free?
Surprisingly, the answer is no!In fact there is only one natural language which requires that we use a
context-sensitive grammar.There is a structure which can occur in the dialect of Swiss-German spoken
around Zurich which make use of strings of the form:
ambncmdn
This apparent lack of complexity in natural languages seems surprising: sur-prising enough that the authors of books on artificial intelligence and on formallanguage theory regularly make false pronouncements on this issue. Moral: lookto linguists for facts about natural languages.
26.4 A hierarchy of classes of automaton
For four of these classes of language, there is a corresponding class of automaton:Type Automaton LanguageType 0 Turing machine computably enumerable languagesType 1 Linear-bounded automaton context sensitive languagesType 2 Pushdown automaton context free languagesType 3 Finite automaton regular languages
We have studied all except linear-bounded automata.
26.5. DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 149
26.5 Deterministic and nondeterministic automata
We know that for Type 3 automata, nondeterminism makes no difference:Kleene’s theorem tells us that the class of languages accepted by nondetermin-istic finite automata (NFAs) is the same as the class accepted by deterministicfinite automata (FAs).
We know that for Type 2 automata, nondeterminism does make a differ-ence: there are languages that can be accepted by nondeterministic pushdownautomata (PDAs) that cannot be accepted by any deterministic pushdown au-tomaton (DPDA).
We ignore Type 1 automata.What about Type 0?
26.6 Nondeterminstic Turing Machines
Our definition of Turing Machines was deterministic: given a state and a symbolon the tape, δ tells us exactly which state to go to, what to write on the tape,and in which direction to move the tape:
δ : Q× (Σ ∪ Γ ∪ {∆}) → Q× (Γ ∪ {∆}) × {L,R}
This can easily be modified to allow nondeterministic Turing Machines (NTMs):
δ′ : Q× (Σ ∪ Γ ∪ {∆}) → 2Q×(Γ∪{∆})×{L,R}
A string w is accepted by a NTM N = (Q,Σ,Γ, δ′, q0, F ) if there is some
path from q0 to some q ∈ F on a tape loaded initially with w.We must be more careful about looping NTMs (what do we say if, for some
NTM N and some string w, there is a path through N that rejects, and anotherthat loops?).
If we consider only accepting paths, and do not distinguish rejecting fromlooping (so we cannot distinguish computable from computably enumerable lan-guages), we get the surprising result that nondeterminism makes no difference.
26.7 NTM=TM
Clearly any deterministic Turing Machine can be described by a NTM: nonde-terminism is not compulsory! So TM ⊆ NTM .
We must show NTM ⊆ TM : that is, that any NTM T may be simulatedby a TM T ′.
An NTM has only finitely many edges: label the edges of T with uniquenatural numbers. Now, for any string w ∈ accept(T ), there is at least one finitesequence of labels corresponding to T accepting w.
We design a Universal Turing Machine, T ′, that enumerates all paths in turn,checks whether the path is valid for w and T , and accepts when it finds onethat is (Cohen has the details). Hence, accept(T ) ⊆ accept(T ′).
150 CHAPTER 26. TURING MACHINES V
If w /∈ accept(T ), T ′ will try longer and longer paths and will never halt.Hence, loop(T ′) ⊆ accept(T ).
Together with reject(T ) = ∅ (obvious), we have accept(T ′) = accept(T ).
26.8 More variations on Turing Machines
Our Turing Machines have a distinguished set of final states, and at each stepread and write one tape symbol and move left or right.
It makes no difference to the class of languages accepted if, instead of a setof final states, we add a “halt” instruction (H):
δ : Q× (Σ ∪ Γ ∪ {∆}) → Q× (Γ ∪ {∆}) × {L,R,H}
It makes no difference to the class of languages accepted if, instead of insist-ing on a tape move at each step, we allow a “stay” instruction (S):
δ : Q× (Σ ∪ Γ ∪ {∆}) → Q× (Γ ∪ {∆}) × {L,R, S}
Our Turing Machines have a single tape that is infinite in both directions,a distinguished set of final states, and at each step read and write one tapesymbol and move left or right.
It makes no difference to the class of languages accepted if we restrict thetape so that it is infinite in only one direction (in fact, that is Cohen’s firstdefinition).
It makes no difference to the class of languages accepted if we allow multipleinfinite tapes: the TM decides what to do based on the values beneath the headon all k tapes, and writes all k tapes simultaneously. Of course, k > 0.
Chapter 27
Summary of the course
This lecture is a summary of the whole course.
27.1 Part 0 – Algorithms, and Programs
• Specifications
– signature
– preconditions
– postconditions
• Imperative languages and applicative languages
• Program verification
– assertions
– invariants
27.2 Part I – Formal languages and automata
• Definitions of alphabet, word, language . . .
• Regular expressions and regular languages
• Finite automata:
– Deterministic finite automata
– Nondeterministic finite automata
– Nondeterministic finite automata with Λ
• Kleene’s Theorem
• Pumping Lemma
• Closure properties
151
152 CHAPTER 27. SUMMARY OF THE COURSE
27.3 Part II – Context-Free Languages
• Regular grammars
• Context-free grammars
• Normal forms
• Recursive descent parsing
• LL(1) grammars
• Pushdown automata
• Deterministic and nondeterministic PDAs
• Top-down and bottom-up parsers
• Non-CF languages
• Closure properties
27.4 Part III – Turing Machines
• Origins and definition of Turing machines
• Universal machines
• Computable and computably enumerable languages
• Undecidable problems
• Chomsky hierarchy
• Variations on Turing machines
27.5 COMP 202 exam
According to the University’s www pagehttp://www.vuw.ac.nz/timetables/exam-timetable.aspx
the three-hour final exam is:
on Monday 31 October,in HMLT206,
starting at 9:30am
The exam will cover the whole course but will have slightly more emphasison the second half, as the mid-term test examined to first half.
The format will be similar, though not necessarily identical, to the last twoyears’ exams, which may be found on the course web site.