Organisation Formal Languages Regular Expressions and Regular Languages Conclusion Radboud University Nijmegen Formal Languages, Grammars and Automata Helle Hvid Hansen [email protected]http://www.cs.ru.nl/ ~ helle/ Foundations Group – Intelligent Systems Section Institute for Computing and Information Sciences Radboud University Nijmegen 25 April 2014 Helle Hvid Hansen 25 April 2014 FLGA 1 / 24
24
Embed
Formal Languages, Grammars and Automatacs.ru.nl/is/education/courses/2014/formal-languages/slides/lec1.pdf · Formal Languages Regular Expressions and Regular Languages Conclusion
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
• Logic languages, e.g. first-order logic: ∀x ∈ N ∃y ∈ N : y > x
• ...
A language consists of words (or strings).Words are sequences of letters/symbols from an alphabet.
Helle Hvid Hansen 25 April 2014 FLGA 7 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Alphabet
Def. An alphabet is a finite set, often denoted Σ. Elements of analphabet are called letters or symbols.
Examples:
Σ1 = {a}Σ2 = {0, 1}Σ3 = {A,C ,G ,T}Σ4 = {a, b, c , d , . . . , x , y , z}Σ5 = Chinese alphabet: ± 40.000 symbolsΣ6 = {+,×,−, 0, 1, 2, 3, . . .}
mathematical “alphabet”, countably infinite, so not alphabet.
Helle Hvid Hansen 25 April 2014 FLGA 8 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Words/Strings
Def. Given an alphabet Σ,
• a word (or string) over Σ is a finite sequence of letters from Σ.
• the empty word (i.e. sequence of length 0) is denoted by λ.
• the set of all words over Σ is denoted by Σ∗.
Examples:
• x − (y − x) = 2x − y is a word of length 12 over the alphabetΣ = {x , y ,−,+, (, ),=, 0, 1, 2}
• The students will do their homework is a word of length 11over the alphabetΣ = {The, students, will, do, their, homework, , a,b,c}
Helle Hvid Hansen 25 April 2014 FLGA 9 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Inductive Definition of Words
Inductive definition of Σ∗:
Σ∗ is the smallest set satisfying the following rules:
1 λ ∈ Σ∗.
2 If w ∈ Σ∗ and a ∈ Σ, then wa ∈ Σ∗ (or equivalently, aw ∈ Σ∗)
(Why is “smallest set” important?)
Properties of Σ∗:
• Σ∗ 6= ∅ (why?)
• If Σ 6= ∅, then Σ∗ is infinite
Helle Hvid Hansen 25 April 2014 FLGA 10 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Definitions by Induction on Words
• Definition of Σ∗ says: w is a word if and only if
w = λ or w = va for some word v and letter a.
• We can define a function f on words by defining f (w) byinduction on w (distinguish cases for f (w)):
Base case (w = λ): f (λ) = ...Inductive case (w = va): f (va) = ... (may use f (v))
• If f takes several arguments, we can choose one for theinduction, for example, define f (u,w) by induction on w(u is fixed wrt induction).
Helle Hvid Hansen 25 April 2014 FLGA 11 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Concatenation of Words
• Given words u = ab and w = bc over Σ = {a, b, c}.We can concatenate them to create new words:
u · w = abbc, w · u = bcab, u · u = abab
• Concatenation is a binary operation · on words.
• We define u · w by induction on w . For all u ∈ Σ∗,
Base case: u · λ = uInductive case: u · va = (u · v)a for all v ∈ Σ∗ and a ∈ Σ.
(We will often write uv instead of u · v)
• Some properties: u(vw) = (uv)w , λu = u
Helle Hvid Hansen 25 April 2014 FLGA 12 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
More Operations
• Reversal of words, e.g. (abc)R = cba.Define wR by induction on w :
Base case: λR = λInductive case: (va)R = a · vR for all v ∈ Σ∗ and a ∈ Σ
• Repeating a word: E.g. (ab)2 = abab, (ab)3 = ababab, etc.Define un by induction on n ∈ N (!)
u0 = λ and un+1 = u · un
(Base case: n = 0, Inductive case: n = n′ + 1.)
Helle Hvid Hansen 25 April 2014 FLGA 13 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Counting Occurrences and Length
• |w |a is the number of occurrences of letter a in word w .E.g., |λ|a = 0, |abb|a = 1, |abb|b = 2.Define by induction on w :
|λ|a = 0 and |vb|a =
{|v |a + 1 if a = b|v |a if a 6= b
• |w | is the length of the word w . E.g., |abb| = 3.(Exercise: define it by induction)
Helle Hvid Hansen 25 April 2014 FLGA 14 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Proof by Induction
Prove that some property P holds for all words.For example, P(u, v) could be |uv | = |u|+ |v |.
A proof by induction works as follows:
• Base case: Show P holds for λ (in example: P(u, λ)
• Induction Hypothesis (IH):Assume that P(u, v) holds for all words v of length < n.
• Show that P(u,w) holds for words w of length n(you may use the IH)
We conclude by induction that P(u,w) holds for all words u,w .
See lecture notes by Silva for more examples.See also exercises of this week.
Helle Hvid Hansen 25 April 2014 FLGA 15 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Formal Language
Def. A language L over Σ is a set of words over Σ, that is, L ⊆ Σ∗.
Examples:
• ∅, {λ} are languages over any Σ.
• L1 = {an ∈ {a, b}∗ | n ∈ N is even}• L2 = {anbn ∈ {a, b}∗ | n ∈ N}• L3 = {anbncn ∈ {a, b, c}∗ | n ∈ N}• L4 = {an ∈ {a}∗ | n ∈ N is prime}• L5 = {w ∈ {0, 1}∗ | w is binary representation of a prime}• L6 = {e | e is a well-formed arithmetical expression}• L7 = {P | P is a syntactically correct Java program}• L8 = {S | S is a grammatically correct English sentence}
Helle Hvid Hansen 25 April 2014 FLGA 16 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Operations on Languages
Let L, L1, L2 ⊆ Σ∗.
Concatenation: L1L2 = {uv ∈ Σ∗ | u ∈ L1, v ∈ L2}
Reversal: LR = {uR ∈ Σ∗ | u ∈ L}
Union: L1 ∪ L2 = {u ∈ Σ∗ | u ∈ L1 or u ∈ L2}
Intersection: L1 ∩ L2 = {u ∈ Σ∗ | u ∈ L1 and u ∈ L2}
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Regular Expressions
Def. The set RegEx(Σ) of regular expressions over Σ is thesmallest set satisfying:
1 0, 1 and all a ∈ Σ are in RegEx(Σ).
2 If r , s ∈ RegEx(Σ) then also
(r + s), rs, (r)∗
are in RegEx(Σ).
• We assume 0, 1 are not in Σ.
• We will omit parentheses by using convention that: ∗ bindsstronger than concatenation which binds stronger than +.E.g., we write r + st∗ instead of (r + s(t)∗).
Helle Hvid Hansen 25 April 2014 FLGA 18 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Regular Languages
Def. The language L(e) denoted by a regular expressione ∈ RegEx(Σ) is defined inductively by:
L(0) = ∅L(1) = {λ}L(a) = {a} for all a ∈ Σ
L(rs) = L(r)L(s)
L(r + s) = L(r) ∪ L(s)
L(r∗) = L(r)∗
Def. A language L ⊆ Σ∗ is regular if there exists a regularexpression e ∈ RegEx(Σ) such that L = L(e).
Helle Hvid Hansen 25 April 2014 FLGA 19 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Examples of Regular Languages
Let Σ = {a, b}.
regular expression e language L(e)
a + b {a, b} = Σ(a + b)∗ all words over Σ (Σ∗)a(a + b)∗ all words that begin with ab∗(a + 1)b∗ all words that contain zero or one aa(0 + 1 + b)∗ {a, ab, abb, abbb, . . .}(ab∗)∗0 the empty language (∅)((a + b)(a + b))∗ all words of even length(ab∗)∗a∗ Σ∗
Def.Two regular expressions r and s are equivalent if L(r) = L(s).
Helle Hvid Hansen 25 April 2014 FLGA 20 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Some Questions
1 Given a word w and a regular expression e, is there analgorithm that computes whether w ∈ L(e)?
2 Given regular expressions e1, e2 over the same alphabet, isthere an algorithm that computes whether L(e1) = L(e2)?
3 Are all languages regular? If not, then how can we prove thatsome L is not regular?
Helle Hvid Hansen 25 April 2014 FLGA 21 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Summary
Learning goals of today:
• Notion of formal language
• Operations on word and languages
• Regular expressions for specifying regular languages
Helle Hvid Hansen 25 April 2014 FLGA 22 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Remark on Notation in Lecture Notes
[Silva] [Pitts]
alphabet A, B Σ
empty word/string λ ε
regular expressions 0, 1, r + s ∅, ε, r |s
Helle Hvid Hansen 25 April 2014 FLGA 23 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
What is this course (not) about?
FormalLanguages
Grammars(generators)
Automata(acceptors)
This course:regular, context-free languages and their automata and grammars.
Other courses:context-sensitive, recursively enumerable languages, Turingmachines.