Languages and Machines
Unit one: Formal Languages
2
What is a language?
• means of communication• linear• can translate between two languages• has containment structure of
symbols• has atomic (terminal) symbols from
which all others can be made
3
Hierarchy of structures
Paragraph
Sentence
Word
Letter
1..
1..
1..
4
What is a machine?
• a language* processor• interprets a language• translates one language to another• modifies an expression in a language
* a language can have graphical symbols
5
Examples of languages
• English• German• Java• Mathematics• UML
6
The cat sat on the mat.
Example of language: English
symbol
string
7
Example of language: German
Die Katze saß auf dem Teppich.
8
Example of language: Java
while (sunny) mat.sitOn(myCat);
9
Example of language: Maths
c : Cat ● isOnMat(c)
10
Example of language: UML
• Note: this two-dimensional diagram could be translated to a one-dimensional string (e.g. XML)
Cat
feedstroke
Mat
sitOn(c : Cat)
11
UML in XML<uml> <class> <height>2.1</height> <width>3.5</width> <position x=0.6 y=3.1> <name>'Cat'</name> <attributes></attributes> <operations> <operation>'feed'</operation>
<operation>'stroke'</operation> </operations> </class> <class> ... </class></uml>
12
UML in XML without layout
<uml> <class> <height> 2.1 </height> <width> 3.5 </width> <position x=0.6 y=3.1> <name> 'Cat' </name> <attributes> </attributes> <operations> <operation> 'feed' </operation> <operation> 'stroke' </operation> </operations> </class> <class> ... </class> </uml>
• no information lost• a linear string of symbols
13
Activity
• write down an example of a language, and state what its symbols are
14
Formal language
• has a collection of symbols• is this a set or sequence?• a set• is it a finite or infinite set?• finite• we call this set the alphabet
15
Activity
Write down the alphabet for 1. the language of binary numbers2. the language of ordinary decimal
numbers3. the language of traffic lights4. the language of simple diagrams
made up of circles, rectangles and straight lines
16
Suggested answer
1. {0,1}2. {0,1,2,3,4,5,6,7,8,9}3. {red, amber, green}4. {circle, rectangle, line}
17
Formal language
• a formal language is just a set of strings• is the set finite?• maybe, it depends on the language• e.g.1. the language consisting of all two-
bit strings is finite: {00,01,10,11}• e.g.2. the language consisting of all binary
strings is infinite: given any finite set of strings, we can always add another string by taking the longest string and adding one more bit to it
18
Classes of formal language
regular
phrase structure
context-freecontext-sensitive
19
Classes of language: examples
• The man bites the dog. (Context-free)
• ((5 + 3) / (5 – 1)) * (7 - 5) (Context-free)
• NUMBER_OF_PAWS (Regular)
20
Regular languages
1. the empty set is a regular language2. the set consisting of the empty string () is
a regular language3. the set consisting of a one-symbol string
is a regular language4. a new regular language can be made by
taking a string from a regular language and concatenating it with a string from a regular language
5. a new regular language can be made by taking the union of two regular languages
21
Examples of regular languages
1. the set of all two-bit strings2. the set of all English words3. the set of all decimal integers4. the set of Java identifiers
You don't believe me?...
22
Activity
How many strings do the following regular languages contain?
1. all the possible three-bit strings2. all the single-digit decimal numbers3. all the possible repetitions of the
traffic-light sequence (red, amber, green, amber)
23
Suggested Answers
1. 82. 103. infinite
24
Recognizing regular languages
• regular languages can be recognized and interpreted by a finite-state machine
• for example, here is a machine to recognize a two-bit string:
0
1
0
1
acceptor states
25
Finite-state machines
• Here is a machine that recognizes a bit-string of any length:
0
1
can you simplify this machine?
26
Summary
• A language, formally speaking, is a set of strings. The set may be finite or infinite.
• A string is a finite sequence of symbols. The sequence has a minimal length of zero.
• A symbol is just a mark or shape that conveys meaning. It is a member of a finite alphabet.
27
Summary
• A regular language has very simple formation rules involving sequences, repetitions and alternatives.
• A context-free language is a language in which each kind of phrase has the same structure, irrespective of where it is in the string of phrases
• We speak in natural language, which are not strictly formal languages. The grammar rules of natural languages are more-or-less context free.
• We use computer languages. All the usual computer languages are context-free.