Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali
Feb 16, 2016
Lecture Two:Formal Languages
Formal Languages, Lecture 2, slide 1
Amjad Ali
Formal Language It is an abstraction of the general
characteristics of programming languages
It consists of a set of symbols and some rules of formation of sentences
Sentences are formed by grouping the symbols
Formal Languages, Lecture 2, slide 2
Formal Language
A formal language is the set of all strings permitted by the rules of formation
Formal Languages, Lecture 2, slide 3
What is a language? A system for the expression of
certain ideas, facts, or concepts, including a set of symbols and rules for their manipulation
Formal Languages, Lecture 2, slide 4
Mathematical definition of a language This shall require us to understand
the following concepts first Alphabets Strings Concatenation of strings etc.
Formal Languages, Lecture 2, slide 5
Alphabet An ALPHABET is a nonempty set of
symbols It is denoted by S Example:
S = {a,b}where a and b are symbols
Formal Languages, Lecture 2, slide 6
Alphabets An alphabet is any finite set of
symbols {0,1}: binary alphabet {0,1,2,3,4,5,6,7,8,9}: decimal
alphabet ASCII, Unicode: machine-text
alphabets Or just {a,b}: enough for many
examples {}: a legal but not usually interesting
alphabet We will usually use S as the name of the
alphabet we’re considering, as in S = {a,b}
Formal Languages, Lecture 2, slide 7
Strings Strings are constructed from the
individual symbols Strings are finite sequences of
symbols from the alphabet Example : aabba, ababaaa,
abbbaaa, etc are the strings formed by t he symbols of the alphabet
Formal Languages, Lecture 2, slide 8
Symbols And Variables Sometimes we will use variables that stand for
strings: x = abbb In programming languages, syntax helps
distinguish symbols from variables String x = "abbb";
In formal language, we rely on context and naming conventions to tell them apart
We'll use the first letters, like a, b, and c, as symbols
The last few, like x, y, and z, will be string variablesFormal Languages, Lecture 2,
slide 9
Assumptions Lower case letters a,b,c,… are
used for elements of the alphabet Lower case letters u,v,w,… for
string names eg w=aabbaba This indicates that w is a string
having specific value aabbaba
Formal Languages, Lecture 2, slide 10
Empty String The empty string is written as Like "" in some programming
languages || = 0 Don't confuse empty set and
empty string: {} {} {}
Formal Languages, Lecture 2, slide 11
Concatenation The concatenation of two strings x and
y is the string containing all the symbols of x in order, followed by all the symbols of y in order
We show concatenation just by writing the strings next to each other
If x = abc and y = def, then xy = abcdef For any x, x = x = x
Formal Languages, Lecture 2, slide 12
Concatenation of the strings Two strings are concatenated by
appending the symbols of one string to the end of the other string
Example u=aaabbbv=abbabba
Concatenated string uv=aaabbbabbabba
Formal Languages, Lecture 2, slide 13
Length of the string The length of the string is the
number of symbols in the string |w| = 5 if w = aabaa Empty String has no symbols and
is denoted by l |l| = 0
Automata Theory, Lecture 2, slide 14
Kleene Star The Kleene closure of an alphabet S, written as
S*, is the language of all strings over S {a}* is the set of all strings of zero or more
as: {, a, aa, aaa, …}
{a,b}* is the set of all strings of zero or more symbols, each of which is either a or b= {, a, b, aa, bb, ab, ba, aaa, …}
x S* means x is a string over S Unless S = {}, S* is infinite
Formal Languages, Lecture 2, slide 15
Kleene Star Iterating a language L L ={ε} L =L L =L·L L =L ·L Kleene star: L*=Un≥ 0 L Example: {a,b}* = {ε,a,b,aa,bb,ab,ba, aab,
…} all finite sequences over {a,b}.
Formal Languages, Lecture 2, slide 16
S+ and S*
S is an alphabet S* is the set of all strings obtained
by concatenating zero or more symbols from S
S* always contains l then S+ = S* - {l}
Formal Languages, Lecture 2, slide 17
Finiteness S is always finite S* and S+ are always infinite
Formal Languages, Lecture 2, slide 18
Numbers We use N to denote the set of
natural numbers: N = {0, 1, …}
Formal Languages, Lecture 2, slide 19
Exponents We use N to denote the set of natural numbers:
N = {0, 1, …} Exponent n concatenates a string with itself n
times If x = ab, then
x0 = x1 = x = ab x2 = xx = abab, etc.
We use parentheses for grouping exponentiations (assuming that S does not contain the parentheses)
(ab)7 = abababababababFormal Languages, Lecture 2, slide 20
Languages A language is a set of strings over
some fixed alphabet Not restricted to finite sets: in fact,
finite sets are not usually interesting languages
All our alphabets are finite, and all our strings are finite, but most of the languages we're interested in are infinite
Formal Languages, Lecture 2, slide 21
Language A language L is defined very
generally as a subset of S A string in a language L will be
called a sentence of L
Formal Languages, Lecture 2, slide 22
Set Formers A set written with extra constraints or
conditions limiting the elements of the set
Not the rigorous definitions we're looking for, but a useful notation anyway:{x {a, b}* | |x| ≤ 2} = {, a, b, aa, bb, ab, ba}
{xy | x {a, aa} and y {b, bb}} = {ab, abb, aab, aabb}
{x {a, b}* | x contains one a and two bs} = {abb, bab, bba}
{anbn | n ≥ 1} = {ab, aabb, aaabbb, aaaabbbb, ...}Formal Languages, Lecture 2, slide 23
Free Variables in Set Formers Unless otherwise constrained,
exponents in a set former are assumed to range over all N
Examples{(ab)n} = {, ab, abab, ababab, abababab, ...}
{anbn} = {, ab, aabb, aaabbb, aaaabbbb, ...}
Formal Languages, Lecture 2, slide 24
The Quest Set formers are relatively informal They can be vague, ambiguous, or
self-contradictory A big part of our quest in the study
of formal language is to develop better tools for defining languages
Formal Languages, Lecture 2, slide 25
Problem S = {a,b} S* = {l,a,b,aa,ab,ba,bb,aaa,aab,aba, abb, baa,bab,bba,bbb,aaaa,… ….}
L = {a,aa,aab} is a language on S as L is a subset of S* and is finite
L = {anbn:n>0} is also a subset of S* but it is infinite
Formal Languages, Lecture 2, slide 26
Concatenation of two Languages L1L2 = {xy :x ε L1 and y ε L2 }
Formal Languages, Lecture 2, slide 27