Module 1 : Introduction to the theory of computation – Set
theory – Definition of sets – Properties – Countability –
Uncountability – Equinumerous sets – Functions – Primitive
recursive and partial recursive functions – Computable and non
computable
Module 1
Theory of computation
The theory of computation is the branch of computer science that
deals with whether and how efficiently problems can be solved on a
model of computation, using an algorithm. The field is divided into
two major branches: computability theory and complexity theory, but
both branches deal with formal models of computation.
In order to perform a rigorous study of computation, computer
scientists work with a mathematical abstraction of computers called
a model of computation. There are several models in use, but the
most commonly examined is the Turing machine. A Turing machine can
be thought of as a desktop PC with a potentially infinite memory
capacity, though it can only access this memory in small discrete
chunks.
Computer scientists study the Turing machine because it is
simple to formulate, can be analyzed and used to prove results, and
because it represents what many consider the most powerful possible
"reasonable" model of computation. It might seem that the
potentially infinite memory capacity is an unrealizable attribute,
but any decidable problem solved by a Turing machine will always
require only a finite amount of memory. So in principle, any
problem that can be solved (decided) by a Turing machine can be
solved by a computer that has a bounded amount of memory.
1.1 Computability theory
Computability theory deals primarily with the question of
whether a problem is solvable at all on a computer. The statement
that the halting problem cannot be solved by a Turing machine is
one of the most important results in computability theory, as it is
an example of a concrete problem that is both easy to formulate and
impossible to solve using a Turing machine. Much of computability
theory builds on the halting problem result.
The next important step in computability theory was the Rice's
theorem, which states that for all non-trivial properties of
partial functions, it is undecidable whether a Turing machine
computes a partial function with that property.
Computability theory is closely related to the branch of
mathematical logic called recursion theory, which removes the
restriction of studying only models of computation which are close
to physically realizable. Many mathematicians and Computational
theorists who study recursion theory will refer to it as
computability theory. There is no real difference between the
fields other than whether a researcher working in this area has his
or her office in the computer science or mathematics field.
1.2 Complexity theory
Complexity theory considers not only whether a problem can be
solved at all on a computer, but also how efficiently the problem
can be solved. Two major aspects are considered: time complexity
and space complexity, which are respectively how many steps does it
take to perform a computation, and how much memory is required to
perform that computation.
In order to analyze how much time and space a given algorithm
requires, computer scientists express the time or space required to
solve the problem as a function of the size of the input problem.
For example, finding a particular number in a long list of numbers
becomes harder as the list of numbers grows larger. If we say there
are n numbers in the list, then if the list is not sorted or
indexed in any way we may have to look at every number in order to
find the number we're seeking. We thus say that in order to solve
this problem, the computer needs to perform a number of steps that
grows linearly in the size of the problem.
To simplify this problem, computer scientists have adopted Big O
notation, which allows functions to be compared in a way that
ensures that particular aspects of a machine's construction do not
need to be considered, but rather only the asymptotic behavior as
problems become large. So in our previous example we might say that
the problem requires O(n) steps to solve.
Perhaps the most important open problem in all of computer
science is the question of whether a certain broad class of
problems denoted NP can be solved efficiently. This is discussed
further at Complexity classes P and NP.
1.3 Set Theory
Set is a group of
elements, having a property that characterizes those elements. One
way is to enumerate the elements completely. All the elements
belonging to the set are explicitly given. Example: A =
{1,2,3,4,5}. Alternate way is to give the properties that
characterize the elements of the set. Example: B = {x | x is a
positive integer less than or equal to 5}. Some sets can also be
defined recursively.
1.3.1 Set terminology
Belongs To
x INCLUDEPICTURE
"http://www.cs.odu.edu/~toida/nerzic/390teched/math/element%5b1%5d.gif"
\* MERGEFORMATINET
B means that x is an element of set B. Using
this notation we can specify the set {0,1,2,3,4} call it Z by
writing
Z = {x | x N | x 5} where N represents the set of natural
numbers.
It is read as "the set of natural numbers that are less than or
equal to 5".
Subset
Let A and B be two sets.
A
is a subset of B, if every element of A is an element of B.
A
is a subset of B is represented as A B.
Note: If A is a subset of B and B is a subset of A then A=B.
Also, if A is a subset of, but not equal to B
represented as A B.
Universal Set
The set U of all the elements we might ever consider in the
discourse is called the universal set.
Complement
If A is a set, then the complement of A is the set consisting of
all elements of the universal set
that are not in A. It is denoted by A' or . Thus
A' = { x | x U ^ x A } , where
means " is not an element of "..
Example:
If U is the set of natural numbers and A =
{ 1,2,3 } , then A' = { x | x U ^ x > 3 } .
Set Operations
The operations that can be performed on sets are:
1.
Union
If A and B are two sets, then the union of A and B is the set that
contains all the elements that are in
A and B including the ones in both A and B. It is denoted by
A B.
Example: If A = {1,2,3} and B = {3,4,5}
then A B = {1,2,3,4,5}
2. Difference
If A and B are two sets, then the difference of A from B is the set
that consists of the elements of A
that are not in B. It is denoted by A - B.
Example: If A = {1,2,3} B = {3,4,5}
then A - B = {1,2}
Note that in general A - B B - A .
For A and B of the above example B - A = {4,5} .
3. Intersection
If A and B are two sets, then the intersection of A and B is the
set that consists of the elements in
both A and B . It is denoted by A B.
Example: If A = {1,2,3,8} B =
{3,4,5,8}
then A B = {3,8}.
Disjoint sets
A and B are said to be disjoint if they contain no elements in
common
i.e. A B = ø, where ø is the Empty set.
Example: A = { 1,2,3,4,5 } and
B = { 6,8,9 } are disjoint.
Following is a list
of some standard Set Identities
A, B, C represent arbitrary sets and ø is
the empty set and U is the Universal Set.
The Commutative laws:
A B = B A
A B = B A
The Associative laws:
A (B C) = (A B) C
A (B C) = (A B) C
The Distributive laws:
A (B C) = (A B) (A C)
A (B C) = (A B) (A C)
The Idempotent laws:
A A = A
A A = A
The Absorptive laws:
A (A B) = A
A (A B) = A
The De Morgan laws:
(A B)' = A' B'
(A B)' = A' B'
Other laws involving Complements:
( A' )' = A
A A' = ø
A A' = U
Other laws involving the empty set
A ø = A
A ø = ø
Other laws involving the Universal Set:
A U = U
A U = A
1.3.2 Generalized Set Operations
Union, intersection
and Cartesian product of sets are associative. For example
holds. To denote either of these expressions we often use A B C .
This can be generalized for the union of any finite number of sets
as A1 A2....An,which we write as
Ai
This generalized
union of sets can be rigorously defined as follows:
Definition ( Ai)
:
Basis Clause: For n =
1 , Ai = A1.
Inductive Clause:
Ai = ( Ai) An+1
Similarly the
generalized intersection Ai and generalized Cartesian product Ai
can be defined.
Based on these
definitions, De Morgan's law on set union and intersection can also
be generalized as follows:
Theorem (Generalized
De Morgan)
= , and
=
1.4
Relations
Let A and B be sets. A binary
relation from A into B is any subset of the Cartesian product A x
B.
Example1: Let's assume that a person owns three shirts and two
pairs of slacks. More precisely, let A = {blue shirt, mint green
shirt} and B = {gray slacks, tan slacks}. Then certainly A x B is
the set of all possible combinations (six) of shirts and slacks
that the individual can wear. However, the individual may wish to
restrict himself to ombinations which are color coordinated,
or "related". This may not be all possible airs in A x B but will
certainly be a subset of A x B. For example, one such subset ay be
{ (blue shirt, gray slack), (black shirt, tan slacks), (mint green
shirt, tan slacks) }.
Eample2: Let A = {2, 3, 5, 6) and define a relation R from A
into A by (a, b) R if and only if divides evenly into b. o, R
= {(2, 2), (3, 3), (5, 5), (6, 6), (2,6), (3, 6)}. A typical
element in R is an ordered pair (x, y). In some cases R can be
described by atually listing the pairs which are in R, as in the
previous example. This may not be onvenient if R is relatively
large. Other notations are used depending on the past practice.
Consider the following relation on real numbers.
R = { (x, y) | y is the square of x}
and S = { (x, y) | x <= y}. Then R could be more naturally
expressed as R(x) = x2 . or R(x) =y where y = x2 .
1.4.1 Relation on a
Set
A relation from a set A into itself
is called a relation on A. and S of Example 2 above are relations
on A = {2, 3, 5, 6}. Let A be a set of people and let P = {(a, b) |
a A ^ b A ^ a is a child of b} . Then P is a relation
on A which we might call a parent-child relation.
1.4.2 Composition
Let R be a relation from a set A into
set B, and S be a relation from set B into set C. The composition
of R and S, written as RS, is the set of pairs of the form(a,
c) A x C, where (a, c) RS if and only if there exists
b B such that (a, b) Rand (b, c) S. For example
PP, where P is the parent-child relation given above, is the
composition of P with itself and it is a relation which we know as
grandparent-grandchild relation.
1.4.3 Properties of Relations
Assume R is a relation on set A; in other words, R A x A.
Let us write a R b to denote (a, b) R .
· Reflexive: R is reflexive if for every a A, a R a.
· Symmetric: R is symmetric if for every a and b in A, if aRb,
then bRa.
· Transitive: R is transitive if for every a, b and c in A, if
aRb and bRc, then aRc.
· Equivalence: R is an equivalence relation on A if R is
reflexive, symmetric and transitive.
1.5 Functions
A function is something that associates each element of a set
with an element of another set (which may or may not be the same as
the first set). The concept of function appears quite often even in
nontechnical contexts. For example, a social security number
uniquely identifies the person, the income tax rate varies
depending on the income, the final letter grade for a course is
often determined by test and exam scores, homeworks and projects,
and so on. In all these cases to each member of a set some member
of another set is assigned. As you might have noticed, a function
is quite like a relation. In fact, formally, we define a function
as a special type of binary relation.
A function, denote it by f, from a set A to a set B is a
relation from A to B that satisfies
1. for each element a in A, there is an element b in B such that
is in the relation, and
2. if and are in the relation, then b = c .
The set A in the above definition is called the domain of the
function and B its codomain. Thus, f is a function if it covers the
domain (maps every element of the domain) and it is single valued.
The relation given by f between a and b represented by the ordered
pair is denoted as f(a) = b , and b is called the image of a
under f . The set of images of the elements of a set S under a
function f is called the image of the set S under f, and is denoted
by f(S) , that is, f(S) = { f(a) | a S }, where S is a subset
of the domain A of f. The image of the domain under f is
called the range of f.
Example: Let f be the function from the set of natural numbers N
to N that maps each natural number x to x2 . Then the domain and
codomain of this f are N, the image of, say 3, under this function
is 9, and its range is the set of squares, i.e. { 0, 1, 4, 9, 16,
....} .
Sum and product
Let f and g be functions from a set A to the set of real numbers
R. Then the sum and the product of f and g are defined as follows:
For all x, ( f + g )(x) = f(x) + g(x) , and for all x, ( f*g )(x) =
f(x)*g(x), where f(x)*g(x) is the product of two real numbers f(x)
and g(x). Example: Let f(x) = 3x + 1 and g(x) = x2 . Then ( f + g
)(x) = x2 + 3x + 1 , and ( f*g )(x) = 3x3 + x2.
One-to-one
A function f is said to be one-to-one (injective) , if and only
if whenever f(x) = f(y) , x= y . Example: The function f(x) = x2
from the set of natural numbers N to N is a one-to-one function.
Note that f(x) = x2 is not one-to-one if it is from the set of
integers(negative as well as non-negative) to N , because for
example f(1) = f(-1) = 1 . Onto
A function f from a set A to a set B is said to be
onto(surjective) , if and only if for every element y of B , there
is an element x in A such that f(x) = y , that is,
f is onto if and only if f( A ) = B . Example: The
function f(x) = 2x from the set of natural numbers N to the set of
non-negative even numbers E is an onto function. However, f(x) = 2x
from the set of natural numbers N to N is not onto, because, for
example, nothing in N can be mapped to 3 by this function.
Bijection
A function is called a bijection, if it is onto and one-to-one.
Example: The function f(x) = 2x from the set of natural numbers N
to the set of non-negative even numbers E is one-to-one and onto.
Thus it is a bijection. Every bijection has a function called the
inverse function. These concepts are illustrated in the figure
below. In each figure below, the points on the left are in the
domain and the ones on the right are in the codomain, and arrows
show < x, f(x) > relation.
Inverse
Let f be a bijection from a set A to a set B. Then the function
g is called the inverse function of f, and it is denoted by f -1 ,
if for every element y of B, g(y) = x , where f(x) = y
. Note that such an x is unique for each y because f is a
bijection. For example, the rightmost function in the above figure
is a bijection and its inverse is obtained by reversing the
direction of each arrow. Example: The inverse function of f(x) = 2x
from the set of natural numbers N to the set of non-negative even
numbers E is f -1(x) = 1/2 x from E to N. It is also a bijection. A
function is a relation. Therefore one can also talk about
composition of functions.
Composite function
Let g be a function from a set A to a set B , and let f be a
function from B to a set C . Then the composition of functions f
and g , denoted by fg , is the function from A to C that satisfies
fg(x) = f( g(x) ) for all x in A. Example:
Let f(x) = x2 , and g(x) = x + 1 . Then f( g(x) ) = ( x
+ 1 )2 .
1.6 Primitive recursive function
The primitive recursive functions are defined using primitive
recursion and composition as central operations and are a strict
subset of the recursive functions (recursive functions are also
known as computable functions). The term was coined by Rózsa Péter.
In computability theory, primitive recursive functions are a class
of functions which form an important building block on the way to a
full formalization of computability. These functions are also
important in proof theory.
Many of the functions normally studied in number theory, and
approximations to real-valued functions, are primitive recursive,
such as addition, division, factorial, exponential, finding the nth
prime, and so on (Brainerd and Landweber, 1974). In fact, it is
difficult to devise a function that is not primitive recursive,
although some are known (see the section on Limitations below).The
set of primitive recursive functions is known as PR in complexity
theory. Every primitive recursive function is a general recursive
function.
The primitive recursive functions are among the number-theoretic
functions which are functions from the natural numbers (nonnegative
integers) {0, 1, 2 , ...} to the natural numbers which take n
arguments for some natural number n. Such a function is called
n-ary.
The basic primitive recursive functions are given by these
axioms:
1. Constant function: The 0-ary constant function 0 is primitive
recursive.
2. Successor function: The 1-ary successor function S, which
returns the successor of its argument (see Peano postulates), is
primitive recursive.
3. Projection function: For every n≥1 and each i with 1≤i≤n, the
n-ary projection function Pin, which returns its i-th argument, is
primitive recursive.
More complex primitive recursive functions can be obtained by
applying the operators given by these axioms:
1. Composition: Given f, a k-ary primitive recursive function,
and k m-ary primitive recursive functions g1,...,gk, the
composition of f with g1,...,gk, i.e. the m-ary function
h(x1,...,xm) = f(g1(x1,...,xm),...,gk(x1,...,xm)), is primitive
recursive.
2. Primitive recursion: Given f, a k-ary primitive recursive
function, and g, a (k+2)-ary primitive recursive function, the
(k+1)-ary function defined as the primitive recursion of f and g,
i.e. the function h where h(0,x1,...,xk) = f(x1,...,xk) and
h(S(n),x1,...,xk) = g(h(n,x1,...,xk),n,x1,...,xk), is primitive
recursive.
The primitive recursive functions are the basic functions and
those which can be obtained from the basic functions by applying
the operators a finite number of times.
Role of the projection functions
The projection functions can be used to avoid the apparent
rigidity in terms of the arity of the functions above; by using
compositions with various projection functions, it is possible to
pass a subset of the arguments of one function to another function.
For example, if g and h are 2-ary primitive recursive functions
then is also primitive recursive. One formal definition using
projections functions is
Converting predicates to numeric functions
In some settings it is natural to consider primitive recursive
functions that take as inputs tuples that mix numbers with truth
values { t= true, f=false }, or which produce truth values as
outputs (see Kleene [1952 pp.226-227]). This can be accomplished by
identifying the truth values with numbers in any fixed manner. For
example, it is common to identify the truth value t with the number
1 and the truth value f with the number 0. Once this identification
has been made, the characteristic function of a set A, which
literally returns 1 or 0, can be viewed as a predicate that tells
whether a number is in the set A. Such an identification of
predicates with numeric functions will be assumed for the remainder
of this article.
1.6.1 Examples
Most number-theoretic functions which can be defined using
recursion on a single variable are primitive recursive. Basic
examples include the addition and "limited subtraction"
functions.
Addition
Intuitively, addition can be recursively defined with the
rules:
add(0,x)=x,
add(n+1,x)=add(n,x)+1.
In order to fit this into a strict primitive recursive
definition, define:
add(0,x)=P11(x) ,
add(S(n),x)=S(P13(add(n,x),n,x)).
Here P13 is the projection function that takes 3 arguments and
returns the first one.
P11 is simply the identity function; its inclusion is required
by the definition of the primitive recursion operator above; it
plays the role of f. The composition of S and P13, which is
primitive recursive, plays the role of g.
Subtraction
Because primitive recursive functions use natural numbers rather
than integers, and the natural numbers are not closed under
subtraction, a limited subtraction function is studied in this
context. This limited subtraction function sub(a,b) returns b − a
if this is nonnegative and returns 0 otherwise.
The predecessor function acts as the opposite of the successor
function and is recursively defined by the rules:
pred(0)=0,
pred(n+1)=n.
These rules can be converted into a more formal definition by
primitive recursion:
pred(0)=0,
pred(S(n))=P22(pred(n),n).
The limited subtraction function is definable from the
predecessor function in a manner analogous to the way addition is
defined from successor:
sub(0,x)=P11(x),
sub(S(n),x)=pred(P13(sub(n,x),n,x)).
Here sub(a,b) corresponds to b-a; for the sake of simplicity,
the order of the arguments has been switched from the "standard"
definition to fit the requirements of primitive recursion. This
could easily be rectified using composition with suitable
projections.
Other primitive recursive functions include exponentiation and
primality testing. Given primitive recursive functions e, f, g, and
h, a function which returns the value of g when e≤f and the value
of h otherwise is primitive recursive.
Operations on integers and rational numbers
By using Gödel numbers, the primitive recursive functions can be
extended to operate on other objects such as integers and rational
numbers. If integers are encoded by Gödel numbers in a standard
way, the arithmetic operations including addition, subtraction, and
multiplication are all primitive recursive. Similarly, if the
rationals are represented by Gödel numbers then the field
operations are all primitive recursive.
1.6.2 Relationship to recursive functions
The broader class of partial recursive functions is defined by
introducing an unbounded search operator. The use of this operator
may result in a partial function, that is, a relation which has at
most one value for each argument but, unlike a total function, does
not necessarily have any value for an argument (see domain). An
equivalent definition states that a partial recursive function is
one that can be computed by a Turing machine. A total recursive
function is a partial recursive function which is defined for every
input.
Every primitive recursive function is total recursive, but not
all total recursive functions are primitive recursive. The
Ackermann function A(m,n) is a well-known example of a total
recursive function that is not primitive recursive. There is a
characterization of the primitive recursive functions as a subset
of the total recursive functions using the Ackermann function. This
characterization states that a function is primitive recursive if
and only if there is a natural number m such that the function can
be computed by a Turing machine that always halts within A(m,n) or
fewer steps, where n is the sum of the arguments of the primitive
recursive function.
1.6.3 Limitations
Primitive recursive functions tend to correspond very closely
with our intuition of what a computable function must be. Certainly
the initial functions are intuitively computable (in their very
simplicity), and the two operations by which one can create new
primitive recursive functions are also very straightforward.
However the set of primitive recursive functions does not include
every possible computable function — this can be seen with a
variant of Cantor's diagonal argument. This argument provides a
computable function which is not primitive recursive. A sketch of
the proof is as follows:
The primitive recursive functions can be computably numerated.
This numbering is unique on the definitions of functions, though
not unique on the actual functions themselves (as every function
can have an infinite number of definitions — consider simply
composing by the identity function). The numbering is computable in
the sense that it can be defined under formal models of
computability such as μ-recursive functions or Turing machines; but
an appeal to the Church-Turing thesis is likely sufficient.
Now consider a matrix where the rows are the primitive recursive
functions of one argument under this numbering, and the columns are
the natural numbers. Then each element (i, j) corresponds to the
ith unary primitive recursive function being calculated on the
number j. We can write this as fi(j).
Now consider the function g(x) = S(fx(x)). g lies on the
diagonal of this matrix and simply adds one to the value it finds.
This function is computable (by the above), but clearly no
primitive recursive function exists which computes it as it differs
from each possible primitive recursive function by at least one
value. Thus there must be computable functions which are not
primitive recursive.
This argument can be applied to any class of computable (total)
functions that can be enumerated in this way, as explained in the
article Machines that always halt. Note however that the partial
computable functions (those which need not be defined for all
arguments) can be explicitly enumerated, for instance by
enumerating Turing machine encodings.
Other examples of total recursive but not primitive recursive
functions are known:
· The function which takes m to Ackermann(m,m) is a unary total
recursive function that is not primitive recursive.
· The Paris–Harrington theorem involves a total recursive
function which is not primitive recursive. Because this function is
motivated by Ramsey theory, it is sometimes considered more
“natural” than the Ackermann function.
1.6.4 Some common primitive recursive functions
The following examples and definitions are from Kleene (1952)
pp. 223-231 -- many appear with proofs. Most also appear with
similar names, either as proofs or as examples, in
Boolos-Burgess-Jeffrey 2002 pp. 63-70; they add #22 the logarithm
lo(x, y) or lg(x, y) depending on the exact derivation.
In the following we observe that primitive recursive functions
can be of four types:
1. functions for short: "number-theoretic functions" from { 0,
1, 2, ...} to { 0, 1, 2, ...},
2. predicates: from { 0, 1, 2, ...} to truth values { t =true, f
=false },
3. propositional connectives: from truth values { t, f } to
truth values { t, f },
4. representing functions: from truth values { t, f } to { 0, 1,
2, ... }. Many times a predicate requires a representing function
to convert the predicate's output { t, f } to { 0, 1 } (note the
order "t" to "0" and "f" to "1" matches with ~(sig( )) defined
below). By definition a function φ(x) is a "representing function"
of the predicate P(x) if φ takes only values 0 and 1 and produces 0
when P is true".
In the following the mark " ' ", e.g. a' , is the primitive mark
meaning "the successor of", usually thought of as " +1", e.g. a +1
=def a'. The functions 16-21 and #G are of particular interest with
respect to converting primitive recursive predicates to, and
extracting them from, their "arithmetical" form expressed as Gödel
numbers.
1. Addition: a+b
2. Multiplication: a×b
3. Exponentiation: ab,
4. Factorial a! : 0! = 1, a'! = a!×a'
5. pred(a): Decrement: "predecessor of a" defined as "If a> 0
then a-1 → anew else 0 → a."
6. Proper subtraction: a ┴ b defined as "If a ≥ b then a-b else
0."
7. Minimum (a1, ... an)
8. Maximum (a1, ... an)
9. Absolute value: | a-b | =defined a ┴ b + b ┴ a
10. ~sg(a): NOT[signum(a}]: If a=0 then sg(a)=1 else if a>0
then sg(a)=0
11. sg(a): signum(a): If a=0 then sg(a)=0 else if a>0 then
sg(a)=1
12. "b divides a" [ a | b ]: If the remainder ( a, b )=0 then [
a | b ] else b does not divide a "evenly"
13. Remainder ( a, b ): the leftover if b does not divide a
"evenly". Also called MOD(a, b)
14. s = b: sg | a - b |
15. a < b: sg( a' ┴ b )
16. Pr(a): a is a prime number Pr(a) =def a>1 &
NOT(Exists c)1
17. Pi: the i+1-st prime number
18. (a)i : exponent ai of pi =def μx [ x
19. lh(a): the "length" or number of non-vanishing exponents in
a
20. a×b: given the expression of a and b as prime factors then
a×b is the product's expression as prime factors
21. lo(x, y): logarithm of x to the base y
· In the following, the abbreviation x =def xi, ... xn;
subscripts may be applied if the meaning requires.
· #A: A function φ definable explicitly from functions Ψ and
constants q1 , ... qn is primitive recursive in Ψ.
· #B: The finite sum Σy
· #C: A predicate P obtained by substituting functions χ1 ,...,
χm for the respective variables of a predicate Q is primitive
recursive in χ1 ,..., χm, Q.
· #D: The following predicates are primitive recursive in Q and
R:
· NOT_Q(x) .
· Q OR R: Q(x) V R(x),
· Q AND R: Q(x) & R(x),
· Q IMPLIES R: Q(x) → R(x)
· Q is equivalent to R: Q(x) ≡ R(x)
· #E: The following predicates are primitive recursive in the
predicate R:
· (Ey)y
· (y)y
· μyy
· #F: Definition by cases: The function defined thus, where Q1,
..., Qm are mutually exclusive predicates (or "ψ(x) shall have the
value given by the first clause which applies), is primitive
recursive in φ1, ..., Q1, ... Qm:
φ(x) =
· φ1(x) if Q1(x) is true,
· . . . . . . . . . . . . . . . . . . .
· φm(x) if Qm(x) is true
· φm+1(x) otherwise
· #G: If φ satisfies the equation:
φ(y,x) = χ(y, NOT-φ(y; x2, ... xn ), x2, ... xn then φ is
primitive recursive in χ. 'So, in a sense the knowledge of the
value NOT-φ(y; x2 to n ) of the course-of-values function is
equivalent to the knowledge of the sequence of values φ(0,x2 to n),
..., φ(y-1,x2 to n) of the original function."
1.7 Countable set
In mathematics, a countable set is a set with the same
cardinality (i.e., number of elements) as some subset of the set of
natural numbers. The term was originated by Georg Cantor; it stems
from the fact that the natural numbers are often called counting
numbers. A set that is not countable is called uncountable.
A set S is called countable if there exists an injective
function. If f is also surjective, thus making f bijective, then S
is called countably infinite or denumerable. As noted above, this
terminology is not universal: some authors define denumerable to
mean what is here called "countable"; some define countable to mean
what is here called "countably infinite". The next result offers an
alternative definition of a countable set S in terms of a
surjective function:
THEOREM: Let S be a set. The following statements are
equivalent:
1. S is countable, i.e. there exists an injective function
2. Either S is empty or there exists a surjective function
A set is a collection of elements, and may be described in many
ways. One way is simply to list all of its elements; for example,
the set consisting of the integers 3, 4, and 5 may be denoted
{3,4,5}. This is only effective for small sets, however; for larger
sets, this would be time-consuming and error-prone. Instead of
listing every single element, sometimes ellipsis ('…') are used, if
the writer believes that the reader can easily guess what is
missing; for example, presumably denotes the set of integers from 1
to 100. Even in this case, however, it is still possible to list
all the elements, because the set is finite; it has a specific
number of elements.
Some sets are infinite; these sets have more than n elements for
any integer n. For example, the set of natural numbers, denotable
by , has infinitely many elements, and we can't use any normal
number to give its size. Nonetheless, it turns out that infinite
sets do have a well-defined notion of size (or more properly, of
cardinality, which is the technical term for the number of elements
in a set), and not all infinite sets have the same cardinality.
To understand what this means, we must first examine what it
doesn't mean. For example, there are infinitely many odd integers,
infinitely many even integers, and (hence) infinitely many integers
overall. However, it turns out that the number of odd integers,
which is the same as the number of even integers, is also the same
as the number of integers overall. This is because we arrange
things such that for every integer, there is a distinct odd
integer: … −2 → −3, −1 → −1, 0 → 1,
1 → 3, 2 → 5, …; or, more generally,
n → 2n + 1. What we have done here is arranged
the integers and the odd integers into a one-to-one correspondence
(or bijection), which is a function that maps between two sets such
that each element of each set corresponds to a single element in
the other set.
However, not all infinite sets have the same cardinality. For
example, Georg Cantor (who introduced this branch of mathematics)
demonstrated that the real numbers cannot be put into one-to-one
correspondence with the natural numbers (non-negative integers),
and therefore that the set of real numbers has a greater
cardinality than the set of natural numbers.
A set is countable if: (1) it is finite, or (2) it has the same
cardinality (size) as the set of natural numbers. Equivalently, a
set is countable if it has the same cardinality as some subset of
the set of natural numbers. Otherwise, it is uncountable.
THEOREM: The Cartesian product of finitely many countable sets
is countable.
This form of triangular mapping recursively generalizes to
vectors of finitely many natural numbers by repeatedly mapping the
first two elements to a natural number. For example, (0,2,3) maps
to (5,3) which maps to 41. Sometimes more than one mapping is
useful. This is where you map the set which you want to show
countably infinite, onto another set; and then map this other set
to the natural numbers. For example, the positive rational numbers
can easily be mapped to (a subset of) the pairs of natural numbers
because p/q maps to (p, q).
THEOREM: Every subset of a countable set is countable. In
particular, every infinite subset of a countably infinite set is
countably infinite. For example, the set of prime numbers is
countable, by mapping the n-th prime number to n:
· 2 maps to 1
· 3 maps to 2
· 5 maps to 3
· 7 maps to 4
· 11 maps to 5
· 13 maps to 6
· 17 maps to 7
· 19 maps to 8
· 23 maps to 9
· etc.
THEOREM: Q (the set of all rational numbers) is countable.
Q can be defined as the set of all fractions a/b where a and b
are integers and b > 0. This can be mapped onto the subset of
ordered triples of natural numbers (a, b, c) such that a ≥ 0, b
> 0, a and b are coprime, and c ∈ {0, 1} such that c = 0 if a/b
≥ 0 and c = 1 otherwise.
· 0 maps to (0,1,0)
· 1 maps to (1,1,0)
· −1 maps to (1,1,1)
· 1/2 maps to (1,2,0)
· −1/2 maps to (1,2,1)
· 2 maps to (2,1,0)
· −2 maps to (2,1,1)
· 1/3 maps to (1,3,0)
· −1/3 maps to (1,3,1)
· 3 maps to (3,1,0)
· −3 maps to (3,1,1)
· 1/4 maps to (1,4,0)
· −1/4 maps to (1,4,1)
· 2/3 maps to (2,3,0)
· −2/3 maps to (2,3,1)
· 3/2 maps to (3,2,0)
· −3/2 maps to (3,2,1)
· 4 maps to (4,1,0)
· −4 maps to (4,1,1)
· ...
THEOREM: (Assuming the axiom of countable choice) The union of
countably many countable sets is countable. For example, given
countable sets a, b, c ...
Using a variant of the triangular enumeration we saw above:
· a0 maps to 0
· a1 maps to 1
· b0 maps to 2
· a2 maps to 3
· b1 maps to 4
· c0 maps to 5
· a3 maps to 6
· b2 maps to 7
· c1 maps to 8
· d0 maps to 9
· a4 maps to 10
· ...
Note that this only works if the sets a, b, c,... are disjoint.
If not, then the union is even smaller and is therefore also
countable by a previous theorem.
THEOREM: The set of all finite-length sequences of natural
numbers is countable.
This set is the union of the length-1 sequences, the length-2
sequences, the length-3 sequences, each of which is a countable set
(finite Cartesian product). So we are talking about a countable
union of countable sets, which is countable by the previous
theorem.
THEOREM: The set of all finite subsets of the natural numbers is
countable.
If you have a finite subset, you can order the elements into a
finite sequence. There are only countably many finite sequences, so
also there are only countably many finite subsets.
THEOREM: Suppose a set R
1. is linearly ordered, and
2. contains at least two members, and
3. is densely ordered, i.e., between any two members there is
another, and
4. has the following least upper bound property. If R is
partitioned into two nonempty sets A and B in such a way that every
member of A is less than every member of B, then there is a
boundary point c (in R), so that every point less than c is in A
and every point greater than c is in B.
Then R is not countable. The set of real numbers with its usual
ordering is a typical example of such an ordered set R; other
examples are real intervals of non-zero width (possibly with
half-open gaps) and surreal numbers. The set of rational numbers
(which is countable) has properties 1, 2, and 3 but does not have
property 4.
The proof
The proof is by contradiction. It begins by assuming R is
countable and thus that some sequence x1, x2, x3, ... has all of R
as its range. Define two other sequences (an) and (bn) as
follows:
Pick a1 < b1 in R (possible because of property 2).
Let an + 1 be the first element in the sequence x that
is strictly between an and bn (possible because of property 3).
Let bn + 1 be the first element in the sequence x that
is strictly between an + 1 and bn.
The two monotone sequences a and b move toward each other. By
the completeness of R, some point c must lie between them. (Define
A to be the set of all elements in R that are smaller than some
member of the sequence a, and let B be the complement of A; then
every member of A is smaller than every member of B, and so
property 4 yields the point c.) Since c is an element of R and the
sequence x represents all of R, we must have c = xi for some index
i (i.e., there must exist an xi in the sequence x, corresponding to
c.) But, when that index was reached in the process of defining the
sequences a and b, c would have been added as the next member of
one or the other sequence, contrary to the fact that c lies
strictly between the two sequences. This contradiction finishes the
proof.
1.8 Fundamental Proof Techniques
1. The Principle of Mathematical Induction
2. The Pigeonhole Principle
3. The Diagonalization Principle
1.8.1 The Principle of Mathematical Induction
Suppose there is a given statement P(n) involving the natural
number n such that
(i) The statement is true for n = 1, i.e., P(1) is true, and
(ii) If the statement is true for n = k (where k is some
positive integer), then the statement is also true for n = k + 1,
i.e., truth of P(k) implies the truth of P (k + 1). Then, P(n) is
true for all natural numbers n.
Property (i) is simply a statement of fact. There may be
situations when a statement is true for all n ≥ 4. In this case,
step 1 will start from n = 4 and we shall verify the result for n =
4, i.e., P(4). Property (ii) is a conditional property. It does not
assert that the given statement is true for n = k, but only that if
it is true for n = k, then it is also true for n = k +1. So, to
prove that the property holds , only prove that conditional
proposition: If the statement is true for n = k, then it is also
true for n = k + 1. This is sometimes referred to as the inductive
step. The assumption that the given statement is true for n = k in
this inductive step is called the inductive hypothesis.
For example, frequently in mathematics, a formula will be
discovered that appears to fit a pattern like
1 = 12 =1
4 = 22 = 1 + 3
9 = 32 = 1 + 3 + 5
16 = 42 = 1 + 3 + 5 + 7, etc.
It is worth to be noted that the sum of the first two odd
natural numbers is the square of second natural number, sum of the
first three odd natural numbers is the square of third natural
number and so on.Thus, from this pattern it appears that 1 + 3 + 5
+ 7 + ... + (2n – 1) = n2 , i.e, the sum of the first n odd natural
numbers is the square of n.
Let us write P(n): 1 + 3 + 5 + 7 + ... + (2n – 1) = n2.We wish
to prove that P(n) is true for all n. The first step in a proof
that uses mathematical induction is to prove that P (1) is true.
This step is called the basic step. Obviously 1 = 12, i.e., P(1) is
true.
The next step is called the inductive step. Here, we suppose
that P (k) is true for some positive integer k and we need to prove
that P (k + 1) is true. Since P (k) is true, we have
1 + 3 + 5 + 7 + ... + (2k – 1) = k2 ... (1).
Consider
1 + 3 + 5 + 7 + ... + (2k – 1) + {2(k +1) – 1} ... (2)
= k2 + (2k + 1) = (k + 1)2 [Using (1)].
Therefore, P (k + 1) is true and the inductive proof is now
completed. Hence P(n) is true for all natural numbers n.
1.8.2 The Pigeonhole Principle
· Let A and B be finite sets with |A| > |B|. Then there does
not exist a one-to-one function f : A ( B.
· Thus, if |A| > |B|, then for any function f : A ( B, it
must be the case that f(a1) = f(a2) for some a1, a2 ( A.
· That is, if you have more pigeons than pigeonholes, then at
least two pigeons must occupy the same pigeonhole.
Example:
· Theorem: Let G be a graph with n vertices. If there is a path
from vertex A to vertex B (A ( B), then the shortest path from A to
B has length at most n – 1.
· (The length of a path is 1 less than the number of vertices
along that path.)
· Proof:
· Let the path be (a0 , a1 , a2, …, am), where a0 = A and am =
B, be the shortest path from A to B.
· This path has length m.
· Suppose that m > n – 1.
· Then some vertex ai must be repeated along the path as aj.
· That means that the path contains a loop from ai to aj .
· This loop may be excised, resulting in a shorter path (a0, …,
ai – 1, ai, aj + 1, …, am) from A to B.
· That is a contradiction. Therefore, the shortest path must
have length no more than n – 1.
1.8.3 The Diagonalization Principle
· Let R be a binary relation on a set A.
· Let D be the diagonal set for R:
D = {a ( A | (a, a) ( R}.
· For each a in A, define a set Ra to be
Ra = {b ( A | (a, b) ( R}.
· Then D is distinct from each Ra.
Example 1:
· Theorem: The set 2N is uncountable.
· Proof:
· Suppose that 2N is countable.
· Then there is a one-to-one function f : 2N ( N.
· That is, the elements of 2N may be listed A0, A1, A2, …
· Define the diagonal set D to be D = {i | i ( Ai}.
· Then D ( Ai for all i. (Why?)
· But D is a subset of N, so it must equal some Ai.
· That is a contradiction. Therefore, 2N is not countable.
Example 2:
· Theorem: The set of all functions from N to N is
uncountable.
· Proof:
· Suppose that the set is countable.
· Then its elements can be listed f0, f1, f2, …
· Define a function g as g(i) = 0 if fi(i) ( 0, and g(i) = 1 if
fi(i) = 0.
· Then g ( fi for all i.
· But g is a function from N to N, so it must equal some fi.
· That is a contradiction. Therefore, the set is not
countable.
Example 3:
· Theorem: The set of real numbers in the interval [0, 1) is
uncountable.
· Proof:
· Suppose that the set is countable.
· Then its elements can be listed x0, x1, x2, …
· Write each xi in its decimal expansion: xi = 0.di0di1di2…
· Define the i-th digit of a number x to be 0 if dii ( 0 and 1
if dii = 0.
· Then x ( xi for all i.
· But x is a real number in [0, 1), so it must equal some
xi.
· That is a contradiction. Therefore, the set is not
countable.
1.9 Formal representation of languages
First, an alphabet is a finite set of symbols. For example {0,
1} is an alphabet with two symbols, {a, b} is another alphabet with
two symbols and English alphabet is also an alphabet. A string
(also called a word) is a finite sequence of symbols of an
alphabet. b, a and aabab are examples of string over alphabet {a,
b} and 0, 10 and 001 are examples of string over alphabet {0,
1}.
A language is a set of strings over an alphabet. Thus {a, ab,
baa} is a language (over alphabert {a,b}) and {0, 111} is a
language (over alphabet {0,1}). The number of symbols in a string
is called the length of the string. For a string w its length is
represented by |w|. It can be defined more formally by recursive
definition. The empty string (also called null string) is the
string with length 0. That is, it has no symbols. The empty string
is denoted by (capital lambda). Thus || = 0.
Let u and v be strings. Then uv denotes the string obtained by
concatenating u with v, that is, uv is the string obtained by
appending the sequence of symbols of v to that of u. For example if
u = aab and v = bbab, then uv = aabbbab. Note that vu = bbabaab uv.
We are going to use first few symbols of English alphabet such as a
and b to denote symbols of an alphabet and those toward the end
such as u and v for strings. A string x is called a substring of
another string y if there are strings u and v such that y = uxv.
Note that u and v may be an empty string. So a string is a
substring of itself. A string x is a prefix of another string y if
there is a string v such that y = xv. v is called a suffix of
y.
Some special languages
The empty set is a language which has no strings. The set {} is
a language which has one string, namely . Though has no symbols,
this set has an object in it. So it is not empty. For any alphabet
, the set of all strings over (including the empty string) is
denoted by . Thus a language over alphabet is a subset of .
1.9.1 Operations on languages
Since languages are sets, all the set operations can be applied
to languages. Thus the union, intersetion and difference of two
languages over an alphabet are languages over . The complement of a
language L over an alphabet is - L and it is also a language.
Another operation onlanguages is concatenation. Let L1 and L2 be
languages. Then the concatenation of L1 with L2 is denoted as L1L2
and it is defined as L1L2 = { uv | u L1 and v L2 }. That is L1L2 is
the set of strings obtained by concatenating strings of L1 with
those of L2. For example {ab, b} {aaa, abb, aaba} = {abaaa, ababb,
abaaba, baaa, babb, baaba}.
Powers : For a symbol a
INCLUDEPICTURE
"http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/Sigma.gif"
\* MERGEFORMATINET and a natural number k, ak represents the
concatenation of k a's. For a string u
INCLUDEPICTURE
"http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/Sigma-star.gif"
\* MERGEFORMATINET and a natural number k, uk denotes the
concatenation of k u's. Similarly for a language L, Lk means the
concatenation of k L's. Hence Lk is the set of strings that can be
obtained by concatenating k strings of L. These powers can be
formally defined recursively. For example Lk can be defined
recursively as follows.
Recursive definition of Lk:
Basis Clause: L0 = { }. Inductive Clause: L(k+1) = Lk L. Since
Lk is defined for natural numbers k, the extremal clause is not
necessary. ak and uk can be defined similarly. Here a0 = and u0 = .
The following two types of languages are generalizations of * and
we are going to see them quite often in this course.
Recursive definition of L*:
Basis Clause:
INCLUDEPICTURE
"http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/in.gif" \*
MERGEFORMATINET L*
Inductive Clause: For any x L* and any w L, xw L*. Extremal
Clause: Nothing is in L* unless it is obtained from the above two
clauses. L* is the set of strings obtained by concatenating zero or
more strings of L as we are going to see in Theorem 1. This * is
called Kleene star. For example if L = { aba, bb }, then L* = { ,
aba, bb, ababb, abaaba, bbbb, bbaba, ... } The * in * is also the
same Kleene star defined above.
Recursive definition of L+:
Basis Clause: L L+
Inductive Clause: For any x L+ and any w L, xw L+. Extremal
Clause: Nothing is in L+ unless it is obtained from the above two
clauses. Thus L+ is the set of strings obtained by concatenating
one or more strings of L. For example if L = { aba, bb }, then L+ =
{ aba, bb, ababb, abaaba, bbbb, bbaba, ... }
Let us also define (i.e. L0 L L2 ... ) as = { x | x Lk for some
natural number k } . Then the following relationships hold on L*
and L+. Theorems 1 and 2 are proven in "General Induction" which
you study in the next unit. Other proofs are omitted. Theorem 1: Ln
L*
Theorem 2:
Theorem 3:
Theorem 4: L+ = L L* = L*L
Note: According to Theorems 2 and 3, any nonempty string in L*
or L+ can be expresssed as the concatenation of strings of L, i.e.
w1w2...wk for some k, where wi's are strings of L. L* and L* have a
number of interesting properties. Let us list one of them as a
theorem and prove it.
Theorem 5: L* = (L*)*.
Proof: Because by Theorem 2, by applying Theorem 2 to the
language L* we can see thatL* (L*)*. Conversely ( L* )* L* can be
proven as follows: Let x be an arbitrary nonempty string of ( L*
)*. Then there are nonempty strings w1, w2, ..., wk in L* such that
x = w1w2...wk . Since w1, w2, ..., wk are strings of L*, for each
wi there are strings wi1, wi2, ..., wimi in L such that wi =
wi1wi2...wimi. Hence x = w11 ...w1m1w21...w2m2...wm1...wmmk . Hence
x is in L* . If x is an empty string, then it is obviously in L* .
Hence ( L* )* L* . Since L* (L*)* and ( L* )* L* , L* = ( L*
)* .
1.10 Regular expression
In computing, regular expressions provide a concise and flexible
means for identifying strings of text of interest, such as
particular characters, words, or patterns of characters. Regular
expressions (abbreviated as regex or regexp, with plural forms
regexes, regexps, or regexen) are written in a formal language that
can be interpreted by a regular expression processor, a program
that either serves as a parser generator or examines text and
identifies parts that match the provided specification.
The following examples illustrate a few specifications that
could be expressed in a regular expression:
· the sequence of characters "car" in any context, such as
"car", "cartoon", or "bicarbonate"
· the word "car" when it appears as an isolated word
· the word "car" when preceded by the word "blue" or "red"
· a dollar sign immediately followed by one or more digits, and
then optionally a period and exactly two more digits
Regular expressions can be much more complex than these
examples. Regular expressions are used by many text editors,
utilities, and programming languages to search and manipulate text
based on patterns. For example, Perl and Tcl have a powerful
regular expression engine built directly into their syntax. Several
utilities provided by Unix distributions—including the editor ed
and the filter grep—were the first to popularize the concept of
regular expressions.
As an example of the syntax, the regular expression \bex can be
used to search for all instances of the string "ex" that occur
after word boundaries (signified by the \b). Thus in the string
"Texts for experts," \bex matches the "ex" in "experts" but not in
"Texts" (because the "ex" occurs inside a word and not immediately
after a word boundary).
Many modern computing systems provide wildcard characters in
matching filenames from a file system. This is a core capability of
many command-line shells and is also known as globbing. Wildcards
differ from regular expressions in that they generally only express
very limited forms of alternatives.
Basic concepts
A regular expression, often called a pattern, is an expression
that describes a set of strings. They are usually used to give a
concise description of a set, without having to list all
elements.
1.11 Formal language theory
Regular expressions can be expressed in terms of formal language
theory. Regular expressions consist of constants and operators that
denote sets of strings and operations over these sets,
respectively. Given a finite alphabet Σ the following constants are
defined:
· (empty set) ∅ denoting the set ∅
· (empty string) ε denoting the set {ε}
· (literal character) a in Σ denoting the set {a}
The following operations are defined:
· (concatenation) RS denoting the set { αβ | α in R and β in S
}. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd",
"cef"}.
· (alternation) R|S denoting the set union of R and S. Many
textbooks use the symbols ∪, +, or ∨ for alternation instead of the
vertical bar. For example {"ab", "c"}∪{"d", "ef"} = {"ab", "c",
"d", "ef"}
· (Kleene star) R* denoting the smallest superset of R that
contains ε and is closed under string concatenation. This is the
set of all strings that can be made by concatenating zero or more
strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab",
"abc", "cab", "cc", "ababab", "abcab", ... }.
The above constants and operators form a Kleene algebra. To
avoid brackets it is assumed that the Kleene star has the highest
priority, then concatenation and then set union. If there is no
ambiguity then brackets may be omitted. For example, (ab)c can be
written as abc, and a|(b(c*)) can be written as a|bc*.
Examples:
· a|b* denotes {ε, a, b, bb, bbb, ...}
· (a|b)* denotes the set of all strings with no symbols other
than a and b, including the empty string: {ε, a, b, aa, ab, ba, bb,
aaa, ...}
· ab*(c|ε) denotes the set of strings starting with a, then zero
or more bs and finally optionally a c: {a, ac, ab, abc, abb,
...}
The formal definition of regular expressions is purposely
parsimonious and avoids defining the redundant quantifiers ? and +,
which can be expressed as follows: a+ = aa*, and a? = (a|ε).
Sometimes the complement operator ~ is added; ~R denotes the set of
all strings over Σ* that are not in R. The complement operator is
redundant, as it can always be expressed by using the other
operators (although the process for computing such a representation
is complex, and the result may be exponentially larger).
Regular expressions in this sense can express the regular
languages, exactly the class of languages accepted by finite state
automata. There is, however, a significant difference in
compactness. Some classes of regular languages can only be
described by automata that grow exponentially in size, while the
length of the required regular expressions only grow linearly.
Regular expressions correspond to the type-3 grammars of the
Chomsky hierarchy. On the other hand, there is a simple mapping
from regular expressions to nondeterministic finite automata (NFAs)
that does not lead to such a blowup in size; for this reason NFAs
are often used as alternative representations of regular
expressions.
It is possible to write an algorithm which for two given regular
expressions decides whether the described languages are essentially
equal, reduces each expression to a minimal deterministic finite
state machine, and determines whether they are isomorphic
(equivalent).
To what extent can this redundancy be eliminated? Can we find an
interesting subset of regular expressions that is still fully
expressive? Kleene star and set union are obviously required, but
perhaps we can restrict their use. This turns out to be a
surprisingly difficult problem. As simple as the regular
expressions are, it turns out there is no method to systematically
rewrite them to some normal form. The lack of axiomatization in the
past led to the star height problem. Recently, Cornell University
professor Dexter Kozen axiomatized regular expressions with Kleene
algebra.
Patterns for non-regular languages
Many features found in modern regular expression libraries
provide an expressive power that far exceeds the regular languages.
For example, the ability to group subexpressions with parentheses
and recall the value they match in the same expression means that a
pattern can match strings of repeated words like "papa" or
"WikiWiki", called squares in formal language theory. The pattern
for these strings is (.*)\1. However, the language of squares is
not regular, nor is it context-free. Pattern matching with an
unbounded number of back references, as supported by numerous
modern tools, is NP-hard.
However, many tools, libraries, and engines that provide such
constructions still use the term regular expression for their
patterns. This has led to a nomenclature where the term regular
expression has different meanings in formal language theory and
pattern matching. For this reason, some people have taken to using
the term regex or simply pattern to describe the latter. Larry Wall
(author of Perl) writes in Apocalypse 5:
“
'Regular expressions' [...] are only marginally related to real
regular expressions. Nevertheless, the term has grown with the
capabilities of our pattern matching engines, so I'm not going to
try to fight linguistic necessity here. I will, however, generally
call them "regexes" (or "regexen", when I'm in an Anglo-Saxon
mood).[2]
”
Implementations and running times
There are at least two fundamentally different algorithms that
decide if and how a given regular expression matches a string. The
oldest and fastest relies on a result in formal language theory
that allows every nondeterministic finite state machine (NFA) to be
transformed into a deterministic finite state machine (DFA). The
algorithm performs or simulates this transformation and then runs
the resulting DFA on the input string, one symbol at a time. The
latter process takes time linear to the length of the input string.
More precisely, an input string of size n can be tested against a
regular expression of size m in time O(n+2m) or O(nm), depending on
the details of the implementation. This algorithm is often referred
to as DFA. It is fast, but can only be used for matching and not
for recalling grouped subexpressions, lazy quantification, and
several other features commonly found in modern regular expression
libraries. It is also possible to run the NFA directly, essentially
building each DFA state on demand and then discarding it at the
next step. This avoids the exponential memory requirements of a
fully-constructed DFA while still guaranteeing linear-time
search.[3]
The other algorithm is to match the pattern against the input
string by backtracking. This algorithm is commonly called NFA, but
this terminology can be confusing. Its running time can be
exponential, which simple implementations exhibit when matching
against expressions like (a|aa)*b that contain both alternation and
unbounded quantification and force the algorithm to consider an
exponentially increasing number of sub-cases. More complex
implementations will often identify and speed up or abort common
cases where they would otherwise run slowly.
Although backtracking implementations give an exponential
guarantee in the worst case, they provide much greater flexibility
and expressive power. For example, any implementation which allows
the use of backreferences, or implements the various improvements
introduced by Perl, must use a backtracking implementation.
Some implementations try to provide the best of both algorithms
by first running a fast DFA match to see if the string matches the
regular expression at all, and only in that case perform a
potentially slower backtracking match.
1.11.1 Uses of regular expressions
Regular expressions are useful in the production of syntax
highlighting systems, data validation, and many other tasks. While
regular expressions would be useful on search engines such as
Google or Live Search, processing them across the entire database
could consume excessive computer resources depending on the
complexity and design of the regex. Although in many cases system
administrators can run regex-based queries internally, most search
engines do not offer regex support to the public. A notable
exception is Google Code Search.
1.12 Regular Expressions
Just as finite automata are used to recognize patterns of
strings, regular expressions are used to generate patterns of
strings. A regular expression is an algebraic formula whose value
is a pattern consisting of a set of strings, called the language of
the expression.
Operands in a regular expression can be:
· characters from the alphabet over which the regular expression
is defined.
· variables whose values are any pattern defined by a regular
expression.
· epsilon which denotes the empty string containing no
characters.
· null which denotes the empty set of strings.
Operators used in regular expressions include:
· Union: If R1 and R2 are regular expressions, then R1 | R2
(also written as R1 U R2 or R1 + R2) is also a regular
expression.
L(R1|R2) = L(R1) U L(R2).
· Concatenation: If R1 and R2 are regular expressions, then R1R2
(also written as R1.R2) is also a regular expression.
L(R1R2) = L(R1) concatenated with L(R2).
· Kleene closure: If R1 is a regular expression, then R1* (the
Kleene closure of R1) is also a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...
Closure has the highest precedence, followed by concatenation,
followed by union.
Examples
The set of strings over {0,1} that end in 3 consecutive 1's.
(0 | 1)* 111
The set of strings over {0,1} that have at least one 1.
0* 1 (0 | 1)*
The set of strings over {0,1} that have at most one 1.
0* | 0* 1 0*
The set of strings over {A..Z,a..z} that contain the word
"main".
Let = A | B | ... | Z | a | b | ... | z
* main *
The set of strings over {A..Z,a..z} that contain 3 x's.
* x * x * x *
The set of identifiers in Pascal
Let = A | B | ... | Z | a | b | ... | z
Let = 0 | 1 | 2 | 3 ... | 9
( | )*
The set of real numbers in Pascal
Let = 0 | 1 | 2 | 3 ... | 9
Let = 'E' * | epsilon
Let = '+' | '-' | epsilon
Let = '.' * | epsilon
*
1.13 Chomsky hierarchy
Within the field of computer science, specifically in the area
of formal languages, the Chomsky hierarchy (occasionally referred
to as Chomsky–Schützenberger hierarchy) is a containment hierarchy
of classes of formal grammars. This hierarchy of grammars was
described by Noam Chomsky in 1956 (see [1]). It is also
named after Marcel-Paul Schützenberger who played a crucial role in
the development of the theory of formal languages.
1.13.1 Formal grammars
A formal grammar of this type consists of:
· a finite set of terminal symbols
· a finite set of nonterminal symbols
· a finite set of production rules with a left and a right-hand
side consisting of a sequence of these symbols
· a start symbol
A formal grammar defines (or generates) a formal language, which
is a (usually infinite) set of finite-length sequences of symbols
(i.e. strings) that may be constructed by applying production rules
to another sequence of symbols which initially contains just the
start symbol. A rule may be applied to a sequence of symbols by
replacing an occurrence of the symbols on the left-hand side of the
rule with those that appear on the right-hand side. A sequence of
rule applications is called a derivation. Such a grammar defines
the formal language: all words consisting solely of terminal
symbols which can be reached by a derivation from the start
symbol.
Nonterminals are usually represented by uppercase letters,
terminals by lowercase letters, and the start symbol by S. For
example, the grammar with terminals {a,b}, nonterminals {S,A,B},
production rules
S ABS
S ε (where ε is the empty string)
BA AB
BS b
Bb bb
Ab ab
Aa aa
and start symbol S, defines the language of all words of the
form anbn (i.e. n copies of a followed by n copies of b). The
following is a simpler grammar that defines a similar language:
Terminals {a,b}, Nonterminals {S}, Start symbol S, Production
rules
S aSb
S ε
1.13.2 The hierarchy
The Chomsky hierarchy consists of the following levels:
· Type-0 grammars (unrestricted grammars) include all formal
grammars. They generate exactly all languages that can be
recognized by a Turing machine. These languages are also known as
the recursively enumerable languages. Note that this is different
from the recursive languages which can be decided by an
always-halting Turing machine.
· Type-1 grammars (context-sensitive grammars) generate the
context-sensitive languages. These grammars have rules of the form
with A a nonterminal and α, β and γ strings of terminals and
nonterminals. The strings α and β may be empty, but γ must be
nonempty. The rule is allowed if S does not appear on the right
side of any rule. The languages described by these grammars are
exactly all languages that can be recognized by a linear bounded
automaton (a nondeterministic Turing machine whose tape is bounded
by a constant times the length of the input.)
· Type-2 grammars (context-free grammars) generate the
context-free languages. These are defined by rules of the form with
A a nonterminal and γ a string of terminals and nonterminals. These
languages are exactly all languages that can be recognized by a
non-deterministic pushdown automaton. Context free languages are
the theoretical basis for the syntax of most programming
languages.
· Type-3 grammars (regular grammars) generate the regular
languages. Such a grammar restricts its rules to a single
nonterminal on the left-hand side and a right-hand side consisting
of a single terminal, possibly followed (or preceded, but not both
in the same grammar) by a single nonterminal. The rule is also
allowed here if S does not appear on the right side of any rule.
These languages are exactly all languages that can be decided by a
finite state automaton. Additionally, this family of formal
languages can be obtained by regular expressions. Regular languages
are commonly used to define search patterns and the lexical
structure of programming languages.
Note that the set of grammars corresponding to recursive
languages is not a member of this hierarchy. Every regular language
is context-free, every context-free language is context-sensitive
and every context-sensitive language is recursive and every
recursive language is recursively enumerable. These are all proper
inclusions, meaning that there exist recursively enumerable
languages which are not context-sensitive, context-sensitive
languages which are not context-free and context-free languages
which are not regular.
The following table summarizes each of Chomsky's four types of
grammars, the class of language it generates, the type of automaton
that recognizes it, and the form its rules must have.