Top Banner
Module 1 Theory of computation The theory of computation is the branch of computer science that deals with whether and how efficiently problems can be solved on a model of computation, using an algorithm. The field is divided into two major branches: computability theory and complexity theory, but both branches deal with formal models of computation. In order to perform a rigorous study of computation, computer scientists work with a mathematical abstraction of computers called a model of computation. There are several models in use, but the most commonly examined is the Turing machine. A Turing machine can be thought of as a desktop PC with a potentially infinite memory capacity, though it can only access this memory in small discrete chunks. Computer scientists study the Turing machine because it is simple to formulate, can be analyzed and used to prove results, and because it represents what many consider the most powerful possible "reasonable" model of computation. It might seem that the potentially infinite memory capacity is an unrealizable attribute, but any decidable problem solved by a Turing machine will always require only a finite amount of memory. So in principle, any problem that can be solved (decided) by a Turing machine can be solved by a computer that has a bounded amount of memory. 1.1 Computability theory Computability theory deals primarily with the question of whether a problem is solvable at all on a computer. The statement that the halting problem cannot be solved by a Turing machine is one of the most important results in computability theory, as it is an example of a concrete problem that is both easy to formulate and impossible to solve using a Turing machine. Much of computability theory builds on the halting problem result. The next important step in computability theory was the Rice's theorem, which states that for all non-trivial
45

Module 1 : Introduction to the theory of computation – Set ... · Web viewTitle Module 1 : Introduction to the theory of computation – Set theory – Definition of sets – Properties

Jan 29, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Module 1 : Introduction to the theory of computation – Set theory – Definition of sets – Properties – Countability – Uncountability – Equinumerous sets – Functions – Primitive recursive and partial recursive functions – Computable and non computable

Module 1

Theory of computation

The theory of computation is the branch of computer science that deals with whether and how efficiently problems can be solved on a model of computation, using an algorithm. The field is divided into two major branches: computability theory and complexity theory, but both branches deal with formal models of computation.

In order to perform a rigorous study of computation, computer scientists work with a mathematical abstraction of computers called a model of computation. There are several models in use, but the most commonly examined is the Turing machine. A Turing machine can be thought of as a desktop PC with a potentially infinite memory capacity, though it can only access this memory in small discrete chunks.

Computer scientists study the Turing machine because it is simple to formulate, can be analyzed and used to prove results, and because it represents what many consider the most powerful possible "reasonable" model of computation. It might seem that the potentially infinite memory capacity is an unrealizable attribute, but any decidable problem solved by a Turing machine will always require only a finite amount of memory. So in principle, any problem that can be solved (decided) by a Turing machine can be solved by a computer that has a bounded amount of memory.

1.1 Computability theory

Computability theory deals primarily with the question of whether a problem is solvable at all on a computer. The statement that the halting problem cannot be solved by a Turing machine is one of the most important results in computability theory, as it is an example of a concrete problem that is both easy to formulate and impossible to solve using a Turing machine. Much of computability theory builds on the halting problem result.

The next important step in computability theory was the Rice's theorem, which states that for all non-trivial properties of partial functions, it is undecidable whether a Turing machine computes a partial function with that property.

Computability theory is closely related to the branch of mathematical logic called recursion theory, which removes the restriction of studying only models of computation which are close to physically realizable. Many mathematicians and Computational theorists who study recursion theory will refer to it as computability theory. There is no real difference between the fields other than whether a researcher working in this area has his or her office in the computer science or mathematics field.

1.2 Complexity theory

Complexity theory considers not only whether a problem can be solved at all on a computer, but also how efficiently the problem can be solved. Two major aspects are considered: time complexity and space complexity, which are respectively how many steps does it take to perform a computation, and how much memory is required to perform that computation.

In order to analyze how much time and space a given algorithm requires, computer scientists express the time or space required to solve the problem as a function of the size of the input problem. For example, finding a particular number in a long list of numbers becomes harder as the list of numbers grows larger. If we say there are n numbers in the list, then if the list is not sorted or indexed in any way we may have to look at every number in order to find the number we're seeking. We thus say that in order to solve this problem, the computer needs to perform a number of steps that grows linearly in the size of the problem.

To simplify this problem, computer scientists have adopted Big O notation, which allows functions to be compared in a way that ensures that particular aspects of a machine's construction do not need to be considered, but rather only the asymptotic behavior as problems become large. So in our previous example we might say that the problem requires O(n) steps to solve.

Perhaps the most important open problem in all of computer science is the question of whether a certain broad class of problems denoted NP can be solved efficiently. This is discussed further at Complexity classes P and NP.

1.3 Set Theory

        Set is a group of elements, having a property that characterizes those elements. One way is to enumerate the elements completely. All the elements belonging to the set are explicitly given. Example:   A = {1,2,3,4,5}. Alternate way is to give the properties that characterize the elements of the set. Example: B = {x | x is a positive integer less than or equal to 5}. Some sets can also be defined recursively.      

1.3.1 Set terminology

        Belongs To             x  INCLUDEPICTURE "http://www.cs.odu.edu/~toida/nerzic/390teched/math/element%5b1%5d.gif" \* MERGEFORMATINET

B  means that  x  is an element of set B. Using this notation we can specify the set {0,1,2,3,4} call it Z by writing                      Z = {x | x  N | x 5} where N represents the set of natural numbers.             It is read as "the set of natural numbers that are less than or equal to 5".  

       Subset             Let A and B be two sets.

            A is a subset of B, if every element of A is an element of B.

            A is a subset of B is represented as A  B.

            Note:  If A is a subset of B and B is a subset of A then A=B. Also, if A is a subset of, but not equal to B                         represented as A  B.

        Universal Set             The set U of all the elements we might ever consider in the discourse is called the universal set.

        Complement             If A is a set, then the complement of A is the set consisting of all elements of the universal set             that are not in A. It is denoted by A' or . Thus             A' = { x | x U ^ x  A } ,   where     means " is not an element of "..             Example:                  If U is the set of natural numbers and A = { 1,2,3 } , then A' = { x | x U ^ x > 3 } .  

       Set Operations

            The operations that can be performed on sets are:

           1. Union                    If A and B are two sets, then the union of A and B is the set that contains all the elements that are in                    A and B including the ones in both A and B. It is denoted by A  B.

                   Example:  If   A = {1,2,3} and B = {3,4,5}                                    then A  B = {1,2,3,4,5}

            2. Difference                     If A and B are two sets, then the difference of A from B is the set that consists of the elements of A                     that are not in B. It is denoted by A - B.

                    Example:  If A = {1,2,3} B = {3,4,5}                                     then A - B = {1,2}

                    Note that in general A - B B - A .                     For A and B of the above example B - A = {4,5} .

            3. Intersection                     If A and B are two sets, then the intersection of A and B is the set that consists of the elements in                     both A and B . It is denoted by A  B.                     Example:  If A = {1,2,3,8}     B = {3,4,5,8}                                     then A B = {3,8}.

        Disjoint sets              A and B are said to be disjoint if they contain no elements in common              i.e. A  B =  ø, where ø  is the Empty set.

             Example:   A = { 1,2,3,4,5 }    and     B = { 6,8,9 } are disjoint.

        Following is a list of some standard Set Identities          A, B, C represent arbitrary sets and ø is the empty set and U is the Universal Set.

    The Commutative laws:              A  B = B  A              A  B = B  A

    The Associative laws:              A  (B  C) = (A  B)  C              A  (B  C) = (A  B)  C

    The Distributive laws:              A  (B  C) = (A  B)  (A  C)              A  (B  C) = (A  B)  (A  C)

    The Idempotent laws:              A  A = A              A A = A

    The Absorptive laws:              A  (A  B) = A              A  (A  B) = A

    The De Morgan laws:              (A  B)' = A' B'              (A  B)' = A' B'

    Other laws involving Complements:              ( A' )' = A              A  A' = ø              A  A' = U

    Other laws involving the empty set              A  ø = A              A  ø = ø

    Other laws involving the Universal Set:              A  U = U              A  U = A

          

1.3.2 Generalized Set Operations

        Union, intersection and Cartesian product of sets are associative. For example   holds. To denote either of these expressions we often use A B C . This can be generalized for the union of any finite number of sets as A1 A2....An,which we write as

           Ai

        This generalized union of sets can be rigorously defined as follows:

        Definition ( Ai) :

        Basis Clause: For n = 1 ,   Ai = A1.

        Inductive Clause:   Ai = ( Ai) An+1

        Similarly the generalized intersection Ai and generalized Cartesian product Ai can be defined.

        Based on these definitions, De Morgan's law on set union and intersection can also be generalized as follows:

        Theorem (Generalized De Morgan)

             = ,     and

             =

1.4 Relations                

                        Let A and B be sets. A binary relation from A into B is any subset of the Cartesian product A x B.

Example1: Let's assume that a person owns three shirts and two pairs of slacks. More precisely, let A = {blue shirt, mint green shirt} and B = {gray slacks, tan slacks}. Then certainly A x B is the set of all possible combinations (six) of shirts and slacks that the individual can wear. However, the individual may wish to restrict himself to ombinations which are color coordinated, or "related". This may not be all possible airs in A x B but will certainly be a subset of A x B. For example, one such subset ay be { (blue shirt, gray slack), (black shirt, tan slacks), (mint green shirt, tan slacks) }.

Eample2: Let A = {2, 3, 5, 6) and define a relation R from A into A by (a, b) R if and only if divides evenly into b. o, R = {(2, 2), (3, 3), (5, 5), (6, 6), (2,6), (3, 6)}. A typical element in R is an ordered pair (x, y). In some cases R can be described by atually listing the pairs which are in R, as in the previous example. This may not be onvenient if R is relatively large. Other notations are used depending on the past practice. Consider the following relation on real numbers.

                R = { (x, y) | y is the square of x} and S = { (x, y) | x <= y}. Then R could be more naturally expressed as R(x) = x2 . or R(x) =y where y = x2 .

1.4.1 Relation on a Set                A relation from a set A into itself is called a relation on A. and S of Example 2 above are relations on A = {2, 3, 5, 6}. Let A be a set of people and let P = {(a, b) | a  A ^ b  A ^ a is a child of b} . Then P is a relation on A which we might call a parent-child relation.

1.4.2 Composition

                Let R be a relation from a set A into set B, and S be a relation from set B into set C. The composition of R and S, written as RS, is the set of pairs of the form(a, c)  A x C, where (a, c)  RS if and only if there exists b B such that (a, b)  Rand (b, c)  S. For example PP, where P is the parent-child relation given above, is the composition of P with itself and it is a relation which we know as grandparent-grandchild relation.

1.4.3 Properties of Relations

Assume R is a relation on set A; in other words, R A x A. Let us write a R b to denote (a, b)  R .

· Reflexive: R is reflexive if for every a  A, a R a.

· Symmetric: R is symmetric if for every a and b in A, if aRb, then bRa.  

· Transitive: R is transitive if for every a, b and c in A, if aRb and bRc, then aRc.

·  Equivalence: R is an equivalence relation on A if R is reflexive, symmetric and transitive.

1.5 Functions

A function is something that associates each element of a set with an element of another set (which may or may not be the same as the first set). The concept of function appears quite often even in nontechnical contexts. For example, a social security number uniquely identifies the person, the income tax rate varies depending on the income, the final letter grade for a course is often determined by test and exam scores, homeworks and projects, and so on. In all these cases to each member of a set some member of another set is assigned. As you might have noticed, a function is quite like a relation. In fact, formally, we define a function as a special type of binary relation.

A function, denote it by f, from a set A to a set B is a relation from A to B that satisfies

1. for each element a in A, there is an element b in B such that is in the relation, and

2. if and are in the relation, then b = c .

The set A in the above definition is called the domain of the function and B its codomain. Thus, f is a function if it covers the domain (maps every element of the domain) and it is single valued. The relation given by f between a and b represented by the ordered pair is denoted as  f(a) = b , and b is called the image of a under f . The set of images of the elements of a set S under a function f is called the image of the set S under f, and is denoted by  f(S) , that is, f(S) = { f(a) | a S }, where S is a subset of the domain A of  f. The image of the domain under f is called the range of f.

Example: Let f be the function from the set of natural numbers N to N that maps each natural number x to x2 . Then the domain and codomain of this f are N, the image of, say 3, under this function is 9, and its range is the set of squares, i.e. { 0, 1, 4, 9, 16, ....} .

Sum and product

Let f and g be functions from a set A to the set of real numbers R. Then the sum and the product of f and g are defined as follows: For all x, ( f + g )(x) = f(x) + g(x) , and for all x, ( f*g )(x) = f(x)*g(x), where f(x)*g(x) is the product of two real numbers f(x) and g(x). Example: Let f(x) = 3x + 1 and g(x) = x2 . Then ( f + g )(x) = x2 + 3x + 1 , and ( f*g )(x) = 3x3 + x2.

One-to-one

A function f is said to be one-to-one (injective) , if and only if whenever f(x) = f(y) , x= y . Example: The function f(x) = x2 from the set of natural numbers N to N is a one-to-one function. Note that f(x) = x2 is not one-to-one if it is from the set of integers(negative as well as non-negative) to N , because for example f(1) = f(-1) = 1 . Onto

A function f from a set A to a set B is said to be onto(surjective) , if and only if for every element y of B , there is an element x in A such that  f(x) = y ,  that is,  f is onto if and only if  f( A ) = B . Example: The function f(x) = 2x from the set of natural numbers N to the set of non-negative even numbers E is an onto function. However, f(x) = 2x from the set of natural numbers N to N is not onto, because, for example, nothing in N can be mapped to 3 by this function.

Bijection

A function is called a bijection, if it is onto and one-to-one. Example: The function f(x) = 2x from the set of natural numbers N to the set of non-negative even numbers E is one-to-one and onto. Thus it is a bijection. Every bijection has a function called the inverse function. These concepts are illustrated in the figure below. In each figure below, the points on the left are in the domain and the ones on the right are in the codomain, and arrows show < x, f(x) > relation.

Inverse

Let f be a bijection from a set A to a set B. Then the function g is called the inverse function of f, and it is denoted by f -1 ,  if for every element y of B,  g(y) = x , where f(x) = y . Note that such an x is unique for each y because f is a bijection. For example, the rightmost function in the above figure is a bijection and its inverse is obtained by reversing the direction of each arrow. Example: The inverse function of f(x) = 2x from the set of natural numbers N to the set of non-negative even numbers E is f -1(x) = 1/2 x from E to N. It is also a bijection. A function is a relation. Therefore one can also talk about composition of functions.

Composite function

Let g be a function from a set A to a set B , and let f be a function from B to a set C . Then the composition of functions f and g , denoted by fg , is the function from A to C that satisfies    fg(x) = f( g(x) )   for all x in A. Example: Let  f(x) = x2 , and  g(x) = x + 1 . Then f( g(x) ) = ( x + 1 )2 .

1.6 Primitive recursive function

The primitive recursive functions are defined using primitive recursion and composition as central operations and are a strict subset of the recursive functions (recursive functions are also known as computable functions). The term was coined by Rózsa Péter. In computability theory, primitive recursive functions are a class of functions which form an important building block on the way to a full formalization of computability. These functions are also important in proof theory.

Many of the functions normally studied in number theory, and approximations to real-valued functions, are primitive recursive, such as addition, division, factorial, exponential, finding the nth prime, and so on (Brainerd and Landweber, 1974). In fact, it is difficult to devise a function that is not primitive recursive, although some are known (see the section on Limitations below).The set of primitive recursive functions is known as PR in complexity theory. Every primitive recursive function is a general recursive function.

The primitive recursive functions are among the number-theoretic functions which are functions from the natural numbers (nonnegative integers) {0, 1, 2 , ...} to the natural numbers which take n arguments for some natural number n. Such a function is called n-ary.

The basic primitive recursive functions are given by these axioms:

1. Constant function: The 0-ary constant function 0 is primitive recursive.

2. Successor function: The 1-ary successor function S, which returns the successor of its argument (see Peano postulates), is primitive recursive.

3. Projection function: For every n≥1 and each i with 1≤i≤n, the n-ary projection function Pin, which returns its i-th argument, is primitive recursive.

More complex primitive recursive functions can be obtained by applying the operators given by these axioms:

1. Composition: Given f, a k-ary primitive recursive function, and k m-ary primitive recursive functions g1,...,gk, the composition of f with g1,...,gk, i.e. the m-ary function h(x1,...,xm) = f(g1(x1,...,xm),...,gk(x1,...,xm)), is primitive recursive.

2. Primitive recursion: Given f, a k-ary primitive recursive function, and g, a (k+2)-ary primitive recursive function, the (k+1)-ary function defined as the primitive recursion of f and g, i.e. the function h where h(0,x1,...,xk) = f(x1,...,xk) and h(S(n),x1,...,xk) = g(h(n,x1,...,xk),n,x1,...,xk), is primitive recursive.

The primitive recursive functions are the basic functions and those which can be obtained from the basic functions by applying the operators a finite number of times.

Role of the projection functions

The projection functions can be used to avoid the apparent rigidity in terms of the arity of the functions above; by using compositions with various projection functions, it is possible to pass a subset of the arguments of one function to another function. For example, if g and h are 2-ary primitive recursive functions then is also primitive recursive. One formal definition using projections functions is

Converting predicates to numeric functions

In some settings it is natural to consider primitive recursive functions that take as inputs tuples that mix numbers with truth values { t= true, f=false }, or which produce truth values as outputs (see Kleene [1952 pp.226-227]). This can be accomplished by identifying the truth values with numbers in any fixed manner. For example, it is common to identify the truth value t with the number 1 and the truth value f with the number 0. Once this identification has been made, the characteristic function of a set A, which literally returns 1 or 0, can be viewed as a predicate that tells whether a number is in the set A. Such an identification of predicates with numeric functions will be assumed for the remainder of this article.

1.6.1 Examples

Most number-theoretic functions which can be defined using recursion on a single variable are primitive recursive. Basic examples include the addition and "limited subtraction" functions.

Addition

Intuitively, addition can be recursively defined with the rules:

add(0,x)=x,

add(n+1,x)=add(n,x)+1.

In order to fit this into a strict primitive recursive definition, define:

add(0,x)=P11(x) ,

add(S(n),x)=S(P13(add(n,x),n,x)).

Here P13 is the projection function that takes 3 arguments and returns the first one.

P11 is simply the identity function; its inclusion is required by the definition of the primitive recursion operator above; it plays the role of f. The composition of S and P13, which is primitive recursive, plays the role of g.

Subtraction

Because primitive recursive functions use natural numbers rather than integers, and the natural numbers are not closed under subtraction, a limited subtraction function is studied in this context. This limited subtraction function sub(a,b) returns b − a if this is nonnegative and returns 0 otherwise.

The predecessor function acts as the opposite of the successor function and is recursively defined by the rules:

pred(0)=0,

pred(n+1)=n.

These rules can be converted into a more formal definition by primitive recursion:

pred(0)=0,

pred(S(n))=P22(pred(n),n).

The limited subtraction function is definable from the predecessor function in a manner analogous to the way addition is defined from successor:

sub(0,x)=P11(x),

sub(S(n),x)=pred(P13(sub(n,x),n,x)).

Here sub(a,b) corresponds to b-a; for the sake of simplicity, the order of the arguments has been switched from the "standard" definition to fit the requirements of primitive recursion. This could easily be rectified using composition with suitable projections.

Other primitive recursive functions include exponentiation and primality testing. Given primitive recursive functions e, f, g, and h, a function which returns the value of g when e≤f and the value of h otherwise is primitive recursive.

Operations on integers and rational numbers

By using Gödel numbers, the primitive recursive functions can be extended to operate on other objects such as integers and rational numbers. If integers are encoded by Gödel numbers in a standard way, the arithmetic operations including addition, subtraction, and multiplication are all primitive recursive. Similarly, if the rationals are represented by Gödel numbers then the field operations are all primitive recursive.

1.6.2 Relationship to recursive functions

The broader class of partial recursive functions is defined by introducing an unbounded search operator. The use of this operator may result in a partial function, that is, a relation which has at most one value for each argument but, unlike a total function, does not necessarily have any value for an argument (see domain). An equivalent definition states that a partial recursive function is one that can be computed by a Turing machine. A total recursive function is a partial recursive function which is defined for every input.

Every primitive recursive function is total recursive, but not all total recursive functions are primitive recursive. The Ackermann function A(m,n) is a well-known example of a total recursive function that is not primitive recursive. There is a characterization of the primitive recursive functions as a subset of the total recursive functions using the Ackermann function. This characterization states that a function is primitive recursive if and only if there is a natural number m such that the function can be computed by a Turing machine that always halts within A(m,n) or fewer steps, where n is the sum of the arguments of the primitive recursive function.

1.6.3 Limitations

Primitive recursive functions tend to correspond very closely with our intuition of what a computable function must be. Certainly the initial functions are intuitively computable (in their very simplicity), and the two operations by which one can create new primitive recursive functions are also very straightforward. However the set of primitive recursive functions does not include every possible computable function — this can be seen with a variant of Cantor's diagonal argument. This argument provides a computable function which is not primitive recursive. A sketch of the proof is as follows:

The primitive recursive functions can be computably numerated. This numbering is unique on the definitions of functions, though not unique on the actual functions themselves (as every function can have an infinite number of definitions — consider simply composing by the identity function). The numbering is computable in the sense that it can be defined under formal models of computability such as μ-recursive functions or Turing machines; but an appeal to the Church-Turing thesis is likely sufficient.

Now consider a matrix where the rows are the primitive recursive functions of one argument under this numbering, and the columns are the natural numbers. Then each element (i, j) corresponds to the ith unary primitive recursive function being calculated on the number j. We can write this as fi(j).

Now consider the function g(x) = S(fx(x)). g lies on the diagonal of this matrix and simply adds one to the value it finds. This function is computable (by the above), but clearly no primitive recursive function exists which computes it as it differs from each possible primitive recursive function by at least one value. Thus there must be computable functions which are not primitive recursive.

This argument can be applied to any class of computable (total) functions that can be enumerated in this way, as explained in the article Machines that always halt. Note however that the partial computable functions (those which need not be defined for all arguments) can be explicitly enumerated, for instance by enumerating Turing machine encodings.

Other examples of total recursive but not primitive recursive functions are known:

· The function which takes m to Ackermann(m,m) is a unary total recursive function that is not primitive recursive.

· The Paris–Harrington theorem involves a total recursive function which is not primitive recursive. Because this function is motivated by Ramsey theory, it is sometimes considered more “natural” than the Ackermann function.

1.6.4 Some common primitive recursive functions

The following examples and definitions are from Kleene (1952) pp. 223-231 -- many appear with proofs. Most also appear with similar names, either as proofs or as examples, in Boolos-Burgess-Jeffrey 2002 pp. 63-70; they add #22 the logarithm lo(x, y) or lg(x, y) depending on the exact derivation.

In the following we observe that primitive recursive functions can be of four types:

1. functions for short: "number-theoretic functions" from { 0, 1, 2, ...} to { 0, 1, 2, ...},

2. predicates: from { 0, 1, 2, ...} to truth values { t =true, f =false },

3. propositional connectives: from truth values { t, f } to truth values { t, f },

4. representing functions: from truth values { t, f } to { 0, 1, 2, ... }. Many times a predicate requires a representing function to convert the predicate's output { t, f } to { 0, 1 } (note the order "t" to "0" and "f" to "1" matches with ~(sig( )) defined below). By definition a function φ(x) is a "representing function" of the predicate P(x) if φ takes only values 0 and 1 and produces 0 when P is true".

In the following the mark " ' ", e.g. a' , is the primitive mark meaning "the successor of", usually thought of as " +1", e.g. a +1 =def a'. The functions 16-21 and #G are of particular interest with respect to converting primitive recursive predicates to, and extracting them from, their "arithmetical" form expressed as Gödel numbers.

1. Addition: a+b

2. Multiplication: a×b

3. Exponentiation: ab,

4. Factorial a! : 0! = 1, a'! = a!×a'

5. pred(a): Decrement: "predecessor of a" defined as "If a> 0 then a-1 → anew else 0 → a."

6. Proper subtraction: a ┴ b defined as "If a ≥ b then a-b else 0."

7. Minimum (a1, ... an)

8. Maximum (a1, ... an)

9. Absolute value: | a-b | =defined a ┴ b + b ┴ a

10. ~sg(a): NOT[signum(a}]: If a=0 then sg(a)=1 else if a>0 then sg(a)=0

11. sg(a): signum(a): If a=0 then sg(a)=0 else if a>0 then sg(a)=1

12. "b divides a" [ a | b ]: If the remainder ( a, b )=0 then [ a | b ] else b does not divide a "evenly"

13. Remainder ( a, b ): the leftover if b does not divide a "evenly". Also called MOD(a, b)

14. s = b: sg | a - b |

15. a < b: sg( a' ┴ b )

16. Pr(a): a is a prime number Pr(a) =def a>1 & NOT(Exists c)1

17. Pi: the i+1-st prime number

18. (a)i : exponent ai of pi =def μx [ x

19. lh(a): the "length" or number of non-vanishing exponents in a

20. a×b: given the expression of a and b as prime factors then a×b is the product's expression as prime factors

21. lo(x, y): logarithm of x to the base y

· In the following, the abbreviation x =def xi, ... xn; subscripts may be applied if the meaning requires.

· #A: A function φ definable explicitly from functions Ψ and constants q1 , ... qn is primitive recursive in Ψ.

· #B: The finite sum Σy

· #C: A predicate P obtained by substituting functions χ1 ,..., χm for the respective variables of a predicate Q is primitive recursive in χ1 ,..., χm, Q.

· #D: The following predicates are primitive recursive in Q and R:

· NOT_Q(x) .

· Q OR R: Q(x) V R(x),

· Q AND R: Q(x) & R(x),

· Q IMPLIES R: Q(x) → R(x)

· Q is equivalent to R: Q(x) ≡ R(x)

· #E: The following predicates are primitive recursive in the predicate R:

· (Ey)y

· (y)y

· μyy

· #F: Definition by cases: The function defined thus, where Q1, ..., Qm are mutually exclusive predicates (or "ψ(x) shall have the value given by the first clause which applies), is primitive recursive in φ1, ..., Q1, ... Qm:

φ(x) =

· φ1(x) if Q1(x) is true,

· . . . . . . . . . . . . . . . . . . .

· φm(x) if Qm(x) is true

· φm+1(x) otherwise

· #G: If φ satisfies the equation:

φ(y,x) = χ(y, NOT-φ(y; x2, ... xn ), x2, ... xn then φ is primitive recursive in χ. 'So, in a sense the knowledge of the value NOT-φ(y; x2 to n ) of the course-of-values function is equivalent to the knowledge of the sequence of values φ(0,x2 to n), ..., φ(y-1,x2 to n) of the original function."

1.7 Countable set

In mathematics, a countable set is a set with the same cardinality (i.e., number of elements) as some subset of the set of natural numbers. The term was originated by Georg Cantor; it stems from the fact that the natural numbers are often called counting numbers. A set that is not countable is called uncountable.

A set S is called countable if there exists an injective function. If f is also surjective, thus making f bijective, then S is called countably infinite or denumerable. As noted above, this terminology is not universal: some authors define denumerable to mean what is here called "countable"; some define countable to mean what is here called "countably infinite". The next result offers an alternative definition of a countable set S in terms of a surjective function:

THEOREM: Let S be a set. The following statements are equivalent:

1. S is countable, i.e. there exists an injective function

2. Either S is empty or there exists a surjective function

A set is a collection of elements, and may be described in many ways. One way is simply to list all of its elements; for example, the set consisting of the integers 3, 4, and 5 may be denoted {3,4,5}. This is only effective for small sets, however; for larger sets, this would be time-consuming and error-prone. Instead of listing every single element, sometimes ellipsis ('…') are used, if the writer believes that the reader can easily guess what is missing; for example, presumably denotes the set of integers from 1 to 100. Even in this case, however, it is still possible to list all the elements, because the set is finite; it has a specific number of elements.

Some sets are infinite; these sets have more than n elements for any integer n. For example, the set of natural numbers, denotable by , has infinitely many elements, and we can't use any normal number to give its size. Nonetheless, it turns out that infinite sets do have a well-defined notion of size (or more properly, of cardinality, which is the technical term for the number of elements in a set), and not all infinite sets have the same cardinality.

To understand what this means, we must first examine what it doesn't mean. For example, there are infinitely many odd integers, infinitely many even integers, and (hence) infinitely many integers overall. However, it turns out that the number of odd integers, which is the same as the number of even integers, is also the same as the number of integers overall. This is because we arrange things such that for every integer, there is a distinct odd integer: … −2 → −3, −1 → −1, 0 → 1, 1 → 3, 2 → 5, …; or, more generally, n → 2n + 1. What we have done here is arranged the integers and the odd integers into a one-to-one correspondence (or bijection), which is a function that maps between two sets such that each element of each set corresponds to a single element in the other set.

However, not all infinite sets have the same cardinality. For example, Georg Cantor (who introduced this branch of mathematics) demonstrated that the real numbers cannot be put into one-to-one correspondence with the natural numbers (non-negative integers), and therefore that the set of real numbers has a greater cardinality than the set of natural numbers.

A set is countable if: (1) it is finite, or (2) it has the same cardinality (size) as the set of natural numbers. Equivalently, a set is countable if it has the same cardinality as some subset of the set of natural numbers. Otherwise, it is uncountable.

THEOREM: The Cartesian product of finitely many countable sets is countable.

This form of triangular mapping recursively generalizes to vectors of finitely many natural numbers by repeatedly mapping the first two elements to a natural number. For example, (0,2,3) maps to (5,3) which maps to 41. Sometimes more than one mapping is useful. This is where you map the set which you want to show countably infinite, onto another set; and then map this other set to the natural numbers. For example, the positive rational numbers can easily be mapped to (a subset of) the pairs of natural numbers because p/q maps to (p, q).

THEOREM: Every subset of a countable set is countable. In particular, every infinite subset of a countably infinite set is countably infinite. For example, the set of prime numbers is countable, by mapping the n-th prime number to n:

· 2 maps to 1

· 3 maps to 2

· 5 maps to 3

· 7 maps to 4

· 11 maps to 5

· 13 maps to 6

· 17 maps to 7

· 19 maps to 8

· 23 maps to 9

· etc.

THEOREM: Q (the set of all rational numbers) is countable.

Q can be defined as the set of all fractions a/b where a and b are integers and b > 0. This can be mapped onto the subset of ordered triples of natural numbers (a, b, c) such that a ≥ 0, b > 0, a and b are coprime, and c ∈ {0, 1} such that c = 0 if a/b ≥ 0 and c = 1 otherwise.

· 0 maps to (0,1,0)

· 1 maps to (1,1,0)

· −1 maps to (1,1,1)

· 1/2 maps to (1,2,0)

· −1/2 maps to (1,2,1)

· 2 maps to (2,1,0)

· −2 maps to (2,1,1)

· 1/3 maps to (1,3,0)

· −1/3 maps to (1,3,1)

· 3 maps to (3,1,0)

· −3 maps to (3,1,1)

· 1/4 maps to (1,4,0)

· −1/4 maps to (1,4,1)

· 2/3 maps to (2,3,0)

· −2/3 maps to (2,3,1)

· 3/2 maps to (3,2,0)

· −3/2 maps to (3,2,1)

· 4 maps to (4,1,0)

· −4 maps to (4,1,1)

· ...

THEOREM: (Assuming the axiom of countable choice) The union of countably many countable sets is countable. For example, given countable sets a, b, c ...

Using a variant of the triangular enumeration we saw above:

· a0 maps to 0

· a1 maps to 1

· b0 maps to 2

· a2 maps to 3

· b1 maps to 4

· c0 maps to 5

· a3 maps to 6

· b2 maps to 7

· c1 maps to 8

· d0 maps to 9

· a4 maps to 10

· ...

Note that this only works if the sets a, b, c,... are disjoint. If not, then the union is even smaller and is therefore also countable by a previous theorem.

THEOREM: The set of all finite-length sequences of natural numbers is countable.

This set is the union of the length-1 sequences, the length-2 sequences, the length-3 sequences, each of which is a countable set (finite Cartesian product). So we are talking about a countable union of countable sets, which is countable by the previous theorem.

THEOREM: The set of all finite subsets of the natural numbers is countable.

If you have a finite subset, you can order the elements into a finite sequence. There are only countably many finite sequences, so also there are only countably many finite subsets.

THEOREM: Suppose a set R

1. is linearly ordered, and

2. contains at least two members, and

3. is densely ordered, i.e., between any two members there is another, and

4. has the following least upper bound property. If R is partitioned into two nonempty sets A and B in such a way that every member of A is less than every member of B, then there is a boundary point c (in R), so that every point less than c is in A and every point greater than c is in B.

Then R is not countable. The set of real numbers with its usual ordering is a typical example of such an ordered set R; other examples are real intervals of non-zero width (possibly with half-open gaps) and surreal numbers. The set of rational numbers (which is countable) has properties 1, 2, and 3 but does not have property 4.

The proof

The proof is by contradiction. It begins by assuming R is countable and thus that some sequence x1, x2, x3, ... has all of R as its range. Define two other sequences (an) and (bn) as follows:

Pick a1 < b1 in R (possible because of property 2).

Let an + 1 be the first element in the sequence x that is strictly between an and bn (possible because of property 3).

Let bn + 1 be the first element in the sequence x that is strictly between an + 1 and bn.

The two monotone sequences a and b move toward each other. By the completeness of R, some point c must lie between them. (Define A to be the set of all elements in R that are smaller than some member of the sequence a, and let B be the complement of A; then every member of A is smaller than every member of B, and so property 4 yields the point c.) Since c is an element of R and the sequence x represents all of R, we must have c = xi for some index i (i.e., there must exist an xi in the sequence x, corresponding to c.) But, when that index was reached in the process of defining the sequences a and b, c would have been added as the next member of one or the other sequence, contrary to the fact that c lies strictly between the two sequences. This contradiction finishes the proof.

1.8 Fundamental Proof Techniques

1. The Principle of Mathematical Induction

2. The Pigeonhole Principle

3. The Diagonalization Principle

1.8.1 The Principle of Mathematical Induction

Suppose there is a given statement P(n) involving the natural number n such that

(i) The statement is true for n = 1, i.e., P(1) is true, and

(ii) If the statement is true for n = k (where k is some positive integer), then the statement is also true for n = k + 1, i.e., truth of P(k) implies the truth of P (k + 1). Then, P(n) is true for all natural numbers n.

Property (i) is simply a statement of fact. There may be situations when a statement is true for all n ≥ 4. In this case, step 1 will start from n = 4 and we shall verify the result for n = 4, i.e., P(4). Property (ii) is a conditional property. It does not assert that the given statement is true for n = k, but only that if it is true for n = k, then it is also true for n = k +1. So, to prove that the property holds , only prove that conditional proposition: If the statement is true for n = k, then it is also true for n = k + 1. This is sometimes referred to as the inductive step. The assumption that the given statement is true for n = k in this inductive step is called the inductive hypothesis.

For example, frequently in mathematics, a formula will be discovered that appears to fit a pattern like

1 = 12 =1

4 = 22 = 1 + 3

9 = 32 = 1 + 3 + 5

16 = 42 = 1 + 3 + 5 + 7, etc.

It is worth to be noted that the sum of the first two odd natural numbers is the square of second natural number, sum of the first three odd natural numbers is the square of third natural number and so on.Thus, from this pattern it appears that 1 + 3 + 5 + 7 + ... + (2n – 1) = n2 , i.e, the sum of the first n odd natural numbers is the square of n.

Let us write P(n): 1 + 3 + 5 + 7 + ... + (2n – 1) = n2.We wish to prove that P(n) is true for all n. The first step in a proof that uses mathematical induction is to prove that P (1) is true. This step is called the basic step. Obviously 1 = 12, i.e., P(1) is true.

The next step is called the inductive step. Here, we suppose that P (k) is true for some positive integer k and we need to prove that P (k + 1) is true. Since P (k) is true, we have

1 + 3 + 5 + 7 + ... + (2k – 1) = k2 ... (1).

Consider

1 + 3 + 5 + 7 + ... + (2k – 1) + {2(k +1) – 1} ... (2)

= k2 + (2k + 1) = (k + 1)2 [Using (1)].

Therefore, P (k + 1) is true and the inductive proof is now completed. Hence P(n) is true for all natural numbers n.

1.8.2 The Pigeonhole Principle

· Let A and B be finite sets with |A| > |B|. Then there does not exist a one-to-one function f : A ( B.

· Thus, if |A| > |B|, then for any function f : A ( B, it must be the case that f(a1) = f(a2) for some a1, a2 ( A.

· That is, if you have more pigeons than pigeonholes, then at least two pigeons must occupy the same pigeonhole.

Example:

· Theorem: Let G be a graph with n vertices. If there is a path from vertex A to vertex B (A ( B), then the shortest path from A to B has length at most n – 1.

· (The length of a path is 1 less than the number of vertices along that path.)

· Proof:

· Let the path be (a0 , a1 , a2, …, am), where a0 = A and am = B, be the shortest path from A to B.

· This path has length m.

· Suppose that m > n – 1.

· Then some vertex ai must be repeated along the path as aj.

· That means that the path contains a loop from ai to aj .

· This loop may be excised, resulting in a shorter path (a0, …, ai – 1, ai, aj + 1, …, am) from A to B.

· That is a contradiction. Therefore, the shortest path must have length no more than n – 1.

1.8.3 The Diagonalization Principle

· Let R be a binary relation on a set A.

· Let D be the diagonal set for R:

D = {a ( A | (a, a) ( R}.

· For each a in A, define a set Ra to be

Ra = {b ( A | (a, b) ( R}.

· Then D is distinct from each Ra.

Example 1:

· Theorem: The set 2N is uncountable.

· Proof:

· Suppose that 2N is countable.

· Then there is a one-to-one function f : 2N ( N.

· That is, the elements of 2N may be listed A0, A1, A2, …

· Define the diagonal set D to be D = {i | i ( Ai}.

· Then D ( Ai for all i. (Why?)

· But D is a subset of N, so it must equal some Ai.

· That is a contradiction. Therefore, 2N is not countable.

Example 2:

· Theorem: The set of all functions from N to N is uncountable.

· Proof:

· Suppose that the set is countable.

· Then its elements can be listed f0, f1, f2, …

· Define a function g as g(i) = 0 if fi(i) ( 0, and g(i) = 1 if fi(i) = 0.

· Then g ( fi for all i.

· But g is a function from N to N, so it must equal some fi.

· That is a contradiction. Therefore, the set is not countable.

Example 3:

· Theorem: The set of real numbers in the interval [0, 1) is uncountable.

· Proof:

· Suppose that the set is countable.

· Then its elements can be listed x0, x1, x2, …

· Write each xi in its decimal expansion: xi = 0.di0di1di2…

· Define the i-th digit of a number x to be 0 if dii ( 0 and 1 if dii = 0.

· Then x ( xi for all i.

· But x is a real number in [0, 1), so it must equal some xi.

· That is a contradiction. Therefore, the set is not countable.

1.9 Formal representation of languages

First, an alphabet is a finite set of symbols. For example {0, 1} is an alphabet with two symbols, {a, b} is another alphabet with two symbols and English alphabet is also an alphabet. A string (also called a word) is a finite sequence of symbols of an alphabet. b, a and aabab are examples of string over alphabet {a, b} and 0, 10 and 001 are examples of string over alphabet {0, 1}.

A language is a set of strings over an alphabet. Thus {a, ab, baa} is a language (over alphabert {a,b}) and {0, 111} is a language (over alphabet {0,1}). The number of symbols in a string is called the length of the string. For a string w its length is represented by |w|. It can be defined more formally by recursive definition. The empty string (also called null string) is the string with length 0. That is, it has no symbols. The empty string is denoted by (capital lambda). Thus || = 0.

Let u and v be strings. Then uv denotes the string obtained by concatenating u with v, that is, uv is the string obtained by appending the sequence of symbols of v to that of u. For example if u = aab and v = bbab, then uv = aabbbab. Note that vu = bbabaab uv. We are going to use first few symbols of English alphabet such as a and b to denote symbols of an alphabet and those toward the end such as u and v for strings. A string x is called a substring of another string y if there are strings u and v such that y = uxv. Note that u and v may be an empty string. So a string is a substring of itself. A string x is a prefix of another string y if there is a string v such that y = xv. v is called a suffix of y.

Some special languages

The empty set is a language which has no strings. The set {} is a language which has one string, namely . Though has no symbols, this set has an object in it. So it is not empty. For any alphabet , the set of all strings over (including the empty string) is denoted by . Thus a language over alphabet is a subset of .

1.9.1 Operations on languages

Since languages are sets, all the set operations can be applied to languages. Thus the union, intersetion and difference of two languages over an alphabet are languages over . The complement of a language L over an alphabet is - L and it is also a language. Another operation onlanguages is concatenation. Let L1 and L2 be languages. Then the concatenation of L1 with L2 is denoted as L1L2 and it is defined as L1L2 = { uv | u L1 and v L2 }. That is L1L2 is the set of strings obtained by concatenating strings of L1 with those of L2. For example {ab, b} {aaa, abb, aaba} = {abaaa, ababb, abaaba, baaa, babb, baaba}.

Powers : For a symbol a

INCLUDEPICTURE "http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/Sigma.gif" \* MERGEFORMATINET and a natural number k, ak represents the concatenation of k a's. For a string u

INCLUDEPICTURE "http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/Sigma-star.gif" \* MERGEFORMATINET and a natural number k, uk denotes the concatenation of k u's. Similarly for a language L, Lk means the concatenation of k L's. Hence Lk is the set of strings that can be obtained by concatenating k strings of L. These powers can be formally defined recursively. For example Lk can be defined recursively as follows.

Recursive definition of Lk:

Basis Clause: L0 = { }. Inductive Clause: L(k+1) = Lk L. Since Lk is defined for natural numbers k, the extremal clause is not necessary. ak and uk can be defined similarly. Here a0 = and u0 = . The following two types of languages are generalizations of * and we are going to see them quite often in this course.

Recursive definition of L*:

Basis Clause:

INCLUDEPICTURE "http://www.cs.odu.edu/~toida/nerzic/390teched/symbols/in.gif" \* MERGEFORMATINET L*

Inductive Clause: For any x L* and any w L, xw L*. Extremal Clause: Nothing is in L* unless it is obtained from the above two clauses. L* is the set of strings obtained by concatenating zero or more strings of L as we are going to see in Theorem 1. This * is called Kleene star. For example if L = { aba, bb }, then L* = { , aba, bb, ababb, abaaba, bbbb, bbaba, ... } The * in * is also the same Kleene star defined above.

Recursive definition of L+:

Basis Clause: L L+

Inductive Clause: For any x L+ and any w L, xw L+. Extremal Clause: Nothing is in L+ unless it is obtained from the above two clauses. Thus L+ is the set of strings obtained by concatenating one or more strings of L. For example if L = { aba, bb }, then L+ = { aba, bb, ababb, abaaba, bbbb, bbaba, ... }

Let us also define (i.e. L0 L L2 ... ) as = { x | x Lk for some natural number k } . Then the following relationships hold on L* and L+. Theorems 1 and 2 are proven in "General Induction" which you study in the next unit. Other proofs are omitted. Theorem 1: Ln L*

Theorem 2:

Theorem 3:

Theorem 4: L+ = L L* = L*L

Note: According to Theorems 2 and 3, any nonempty string in L* or L+ can be expresssed as the concatenation of strings of L, i.e. w1w2...wk for some k, where wi's are strings of L. L* and L* have a number of interesting properties. Let us list one of them as a theorem and prove it.

Theorem 5:   L* = (L*)*.

Proof: Because by Theorem 2, by applying Theorem 2 to the language L* we can see thatL* (L*)*. Conversely ( L* )* L* can be proven as follows: Let x be an arbitrary nonempty string of ( L* )*. Then there are nonempty strings w1, w2, ..., wk in L* such that x = w1w2...wk . Since w1, w2, ..., wk are strings of L*, for each wi there are strings wi1, wi2, ..., wimi in L such that wi = wi1wi2...wimi. Hence x = w11 ...w1m1w21...w2m2...wm1...wmmk . Hence x is in L* . If x is an empty string, then it is obviously in L* . Hence ( L* )* L* . Since L* (L*)* and ( L* )* L* ,   L* = ( L* )* .

1.10 Regular expression

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

The following examples illustrate a few specifications that could be expressed in a regular expression:

· the sequence of characters "car" in any context, such as "car", "cartoon", or "bicarbonate"

· the word "car" when it appears as an isolated word

· the word "car" when preceded by the word "blue" or "red"

· a dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits

Regular expressions can be much more complex than these examples. Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns. For example, Perl and Tcl have a powerful regular expression engine built directly into their syntax. Several utilities provided by Unix distributions—including the editor ed and the filter grep—were the first to popularize the concept of regular expressions.

As an example of the syntax, the regular expression \bex can be used to search for all instances of the string "ex" that occur after word boundaries (signified by the \b). Thus in the string "Texts for experts," \bex matches the "ex" in "experts" but not in "Texts" (because the "ex" occurs inside a word and not immediately after a word boundary).

Many modern computing systems provide wildcard characters in matching filenames from a file system. This is a core capability of many command-line shells and is also known as globbing. Wildcards differ from regular expressions in that they generally only express very limited forms of alternatives.

Basic concepts

A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements.

1.11 Formal language theory

Regular expressions can be expressed in terms of formal language theory. Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following constants are defined:

· (empty set) ∅ denoting the set ∅

· (empty string) ε denoting the set {ε}

· (literal character) a in Σ denoting the set {a}

The following operations are defined:

· (concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.

· (alternation) R|S denoting the set union of R and S. Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar. For example {"ab", "c"}∪{"d", "ef"} = {"ab", "c", "d", "ef"}

· (Kleene star) R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }.

The above constants and operators form a Kleene algebra. To avoid brackets it is assumed that the Kleene star has the highest priority, then concatenation and then set union. If there is no ambiguity then brackets may be omitted. For example, (ab)c can be written as abc, and a|(b(c*)) can be written as a|bc*.

Examples:

· a|b* denotes {ε, a, b, bb, bbb, ...}

· (a|b)* denotes the set of all strings with no symbols other than a and b, including the empty string: {ε, a, b, aa, ab, ba, bb, aaa, ...}

· ab*(c|ε) denotes the set of strings starting with a, then zero or more bs and finally optionally a c: {a, ac, ab, abc, abb, ...}

The formal definition of regular expressions is purposely parsimonious and avoids defining the redundant quantifiers ? and +, which can be expressed as follows: a+ = aa*, and a? = (a|ε). Sometimes the complement operator ~ is added; ~R denotes the set of all strings over Σ* that are not in R. The complement operator is redundant, as it can always be expressed by using the other operators (although the process for computing such a representation is complex, and the result may be exponentially larger).

Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by finite state automata. There is, however, a significant difference in compactness. Some classes of regular languages can only be described by automata that grow exponentially in size, while the length of the required regular expressions only grow linearly. Regular expressions correspond to the type-3 grammars of the Chomsky hierarchy. On the other hand, there is a simple mapping from regular expressions to nondeterministic finite automata (NFAs) that does not lead to such a blowup in size; for this reason NFAs are often used as alternative representations of regular expressions.

It is possible to write an algorithm which for two given regular expressions decides whether the described languages are essentially equal, reduces each expression to a minimal deterministic finite state machine, and determines whether they are isomorphic (equivalent).

To what extent can this redundancy be eliminated? Can we find an interesting subset of regular expressions that is still fully expressive? Kleene star and set union are obviously required, but perhaps we can restrict their use. This turns out to be a surprisingly difficult problem. As simple as the regular expressions are, it turns out there is no method to systematically rewrite them to some normal form. The lack of axiomatization in the past led to the star height problem. Recently, Cornell University professor Dexter Kozen axiomatized regular expressions with Kleene algebra.

Patterns for non-regular languages

Many features found in modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, the ability to group subexpressions with parentheses and recall the value they match in the same expression means that a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory. The pattern for these strings is (.*)\1. However, the language of squares is not regular, nor is it context-free. Pattern matching with an unbounded number of back references, as supported by numerous modern tools, is NP-hard.

However, many tools, libraries, and engines that provide such constructions still use the term regular expression for their patterns. This has led to a nomenclature where the term regular expression has different meanings in formal language theory and pattern matching. For this reason, some people have taken to using the term regex or simply pattern to describe the latter. Larry Wall (author of Perl) writes in Apocalypse 5:

'Regular expressions' [...] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).[2]

Implementations and running times

There are at least two fundamentally different algorithms that decide if and how a given regular expression matches a string. The oldest and fastest relies on a result in formal language theory that allows every nondeterministic finite state machine (NFA) to be transformed into a deterministic finite state machine (DFA). The algorithm performs or simulates this transformation and then runs the resulting DFA on the input string, one symbol at a time. The latter process takes time linear to the length of the input string. More precisely, an input string of size n can be tested against a regular expression of size m in time O(n+2m) or O(nm), depending on the details of the implementation. This algorithm is often referred to as DFA. It is fast, but can only be used for matching and not for recalling grouped subexpressions, lazy quantification, and several other features commonly found in modern regular expression libraries. It is also possible to run the NFA directly, essentially building each DFA state on demand and then discarding it at the next step. This avoids the exponential memory requirements of a fully-constructed DFA while still guaranteeing linear-time search.[3]

The other algorithm is to match the pattern against the input string by backtracking. This algorithm is commonly called NFA, but this terminology can be confusing. Its running time can be exponential, which simple implementations exhibit when matching against expressions like (a|aa)*b that contain both alternation and unbounded quantification and force the algorithm to consider an exponentially increasing number of sub-cases. More complex implementations will often identify and speed up or abort common cases where they would otherwise run slowly.

Although backtracking implementations give an exponential guarantee in the worst case, they provide much greater flexibility and expressive power. For example, any implementation which allows the use of backreferences, or implements the various improvements introduced by Perl, must use a backtracking implementation.

Some implementations try to provide the best of both algorithms by first running a fast DFA match to see if the string matches the regular expression at all, and only in that case perform a potentially slower backtracking match.

1.11.1 Uses of regular expressions

Regular expressions are useful in the production of syntax highlighting systems, data validation, and many other tasks. While regular expressions would be useful on search engines such as Google or Live Search, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. A notable exception is Google Code Search.

1.12 Regular Expressions

Just as finite automata are used to recognize patterns of strings, regular expressions are used to generate patterns of strings. A regular expression is an algebraic formula whose value is a pattern consisting of a set of strings, called the language of the expression.

Operands in a regular expression can be:

· characters from the alphabet over which the regular expression is defined.

· variables whose values are any pattern defined by a regular expression.

· epsilon which denotes the empty string containing no characters.

· null which denotes the empty set of strings.

Operators used in regular expressions include:

· Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is also a regular expression.

L(R1|R2) = L(R1) U L(R2).

· Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a regular expression.

L(R1R2) = L(R1) concatenated with L(R2).

· Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a regular expression.

L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...

Closure has the highest precedence, followed by concatenation, followed by union.

Examples

The set of strings over {0,1} that end in 3 consecutive 1's.

(0 | 1)* 111

The set of strings over {0,1} that have at least one 1.

0* 1 (0 | 1)*

The set of strings over {0,1} that have at most one 1.

0* | 0* 1 0*

The set of strings over {A..Z,a..z} that contain the word "main".

Let = A | B | ... | Z | a | b | ... | z

* main *

The set of strings over {A..Z,a..z} that contain 3 x's.

* x * x * x *

The set of identifiers in Pascal

Let = A | B | ... | Z | a | b | ... | z

Let = 0 | 1 | 2 | 3 ... | 9

( | )*

The set of real numbers in Pascal

Let = 0 | 1 | 2 | 3 ... | 9

Let = 'E' * | epsilon

Let = '+' | '-' | epsilon

Let = '.' * | epsilon

*

1.13 Chomsky hierarchy

Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy (occasionally referred to as Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described by Noam Chomsky in 1956 (see [1]). It is also named after Marcel-Paul Schützenberger who played a crucial role in the development of the theory of formal languages.

1.13.1 Formal grammars

A formal grammar of this type consists of:

· a finite set of terminal symbols

· a finite set of nonterminal symbols

· a finite set of production rules with a left and a right-hand side consisting of a sequence of these symbols

· a start symbol

A formal grammar defines (or generates) a formal language, which is a (usually infinite) set of finite-length sequences of symbols (i.e. strings) that may be constructed by applying production rules to another sequence of symbols which initially contains just the start symbol. A rule may be applied to a sequence of symbols by replacing an occurrence of the symbols on the left-hand side of the rule with those that appear on the right-hand side. A sequence of rule applications is called a derivation. Such a grammar defines the formal language: all words consisting solely of terminal symbols which can be reached by a derivation from the start symbol.

Nonterminals are usually represented by uppercase letters, terminals by lowercase letters, and the start symbol by S. For example, the grammar with terminals {a,b}, nonterminals {S,A,B}, production rules

S ABS

S ε (where ε is the empty string)

BA AB

BS b

Bb bb

Ab ab

Aa aa

and start symbol S, defines the language of all words of the form anbn (i.e. n copies of a followed by n copies of b). The following is a simpler grammar that defines a similar language: Terminals {a,b}, Nonterminals {S}, Start symbol S, Production rules

S aSb

S ε

1.13.2 The hierarchy

The Chomsky hierarchy consists of the following levels:

· Type-0 grammars (unrestricted grammars) include all formal grammars. They generate exactly all languages that can be recognized by a Turing machine. These languages are also known as the recursively enumerable languages. Note that this is different from the recursive languages which can be decided by an always-halting Turing machine.

· Type-1 grammars (context-sensitive grammars) generate the context-sensitive languages. These grammars have rules of the form with A a nonterminal and α, β and γ strings of terminals and nonterminals. The strings α and β may be empty, but γ must be nonempty. The rule is allowed if S does not appear on the right side of any rule. The languages described by these grammars are exactly all languages that can be recognized by a linear bounded automaton (a nondeterministic Turing machine whose tape is bounded by a constant times the length of the input.)

· Type-2 grammars (context-free grammars) generate the context-free languages. These are defined by rules of the form with A a nonterminal and γ a string of terminals and nonterminals. These languages are exactly all languages that can be recognized by a non-deterministic pushdown automaton. Context free languages are the theoretical basis for the syntax of most programming languages.

· Type-3 grammars (regular grammars) generate the regular languages. Such a grammar restricts its rules to a single nonterminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed (or preceded, but not both in the same grammar) by a single nonterminal. The rule is also allowed here if S does not appear on the right side of any rule. These languages are exactly all languages that can be decided by a finite state automaton. Additionally, this family of formal languages can be obtained by regular expressions. Regular languages are commonly used to define search patterns and the lexical structure of programming languages.

Note that the set of grammars corresponding to recursive languages is not a member of this hierarchy. Every regular language is context-free, every context-free language is context-sensitive and every context-sensitive language is recursive and every recursive language is recursively enumerable. These are all proper inclusions, meaning that there exist recursively enumerable languages which are not context-sensitive, context-sensitive languages which are not context-free and context-free languages which are not regular.

The following table summarizes each of Chomsky's four types of grammars, the class of language it generates, the type of automaton that recognizes it, and the form its rules must have.