Module 1

Module 1 MCA 202 DATA STRUCTURES ADMN 2009 - 10

MODULE 1

ALGORITHM AND ARRAYS

INTRODUCTION AND OVERVIEW

DATA

The term data means value or set of values. It can be used in both singular and plural terms. The following are some examples of data:

34

12/01/1965

ISBN 81-203-0000-0

21, 25, 67, 89, 90

Pascal

In the above mentioned example values are represented in different ways. Here each value or a collection of all such value is termed data.

ENTITY

An entity is one that has certain attributes and which may be assigned values. For example an employee in an organization is an entity, and the possible attributes are:

Entity : EMPLOYEE

Attributes : NAME DOB SEX DESIGNATION

Values : RAVI.A 30/12/81 M DIRECTOR

All the employees in an organization constitute an entity set. Each attribute of an entity set has a range of values and this range is called the domain of the attribute.

INFORMATION

The term information is used for data with its attribute(s).In other words, information can be defined as meaningful data or processed data.

For example the set of data represented above becomes information related to a person when the user imposes meanings as mentioned below in Table 1.1.

Department of Computer Sciences and Applications,SJCET,Palai Page 1


RELATION BETWEEN DATA AND INFORMATION

DATA TYPE

A data type is a term which refers to the kind of data that may appear in a computation of data. Table 1.2 illustrates the data and its corresponding data type.

CONCEPT OF DATA STRUCTURES


DataProcedure to process

data

Information_1

Information_2

Information_3

Information_iFig: 1.1

Data Data Type

38 Numeric (Integers)

12/01/1965 Date

ISBN 81-203-0000-0 Alphanumeric

Graphics

21, 25, 28,30,35,38 Array of Integers.

Pascal Nick name

Table 1.1

Data Meaning

38 Age of the person

12/01/1965 Date of birth of the person

ISBN 81-203-0000-0 Book number recently published by the person

Number of awards equal to tally marks received by the person

21, 25, 28,30,35,38 Important ages of the person

Pascal Nickname of the person.

Table 1.2


A digital computer manipulates only primitive data, that is of 0’s and 1’s.Manipulation of primitive data does not require any extra user effort. But in real life applications various kinds of data other than primitive data are involved, and manipulation of these data requires the following tasks:

Storage representation of user data: User data should be represented in such a way that the computer should understand it.

Retrieval of stored data: Data retrieved from the computer should be in such a way that the user should understand it.

Transformation of data: Various operations which is required to be performed on user data so that it can be transformed from one form to another.

Basically a data structure constitute the fundamentals of computer science and implies the Domain (The range of value that the data may have), Function (This is the set of operations that can be legally be applied in the data object), Axioms (This refers to the set of rules with which the different operations belonging to the function can actually be implemented). Data Structures is the way of organizing data to do operations on a set of data. The types of data include numeric data and Alpha numeric data. The data may be a single value or a set of values, which is to be processed and organized .This organization of a collection of data leads to structuring of data. Therefore data structure deals with structuring of data.

A data structure D is a triplet, that is, D= (D, F, A) where ‘D’ is the set of data objects, ‘F’ is a set of functions, and ‘A’ is a set of rules to implement the functions.

D = (0, ±1, ±2, ±3,…..)

F = (+,-,*, /, %)

A = (A set of binary arithmetic’s to perform addition, substarction, division, multiplication,

and modulo operations.)

DEFINITION:

Data structure is the representation of logical relationship existing between individual elements of data. In other words, data structure is the way of organizing the data items which consider not only the item stored, but also relationship with individual units of data. There are specific operations performed on the various types of data structures such as list, stack, queue, tree, graph, hash table etc.

Abstract Data Type (ADT)

The data structure with specific operations such as insertion, deletion, searching, sorting, merging, traversing and display are called Abstract Data Type (ADT), which are user defined data types.

E.g.: LIST →ADT

Item →Components



Operations →Insertion, deletion, search, sort and display.

Overview of data structures

Data Structure specifies,

The Organization of data.

Accessing methods.

Degree of associativety.

Processing alternatives of the Information.

The data structure affects both structural and functional aspects of a program.

i.e, ALGORITHM + DATA STRUCTURES = PROGRAM

The Algorithm is a step by step procedure to solve a problem and the data structure is the way of organizing data, maintaining the logical relationship.

CLASSIFICATION OF DATA STRUCTURE

Data Structure is classified basically into two types as,

1. Primitive

2. Non primitive data structures.

Primitive Data Structures

These are the basic data structures which are directly operated by machine instructions.

Eg: Integer, Floating point, Pointer values, Strings.

Non- Primitive Data Structures


DATA SRUCTUREDATA SRUCTURE

PRIMITIVE DATA STRUCTURES

PRIMITIVE DATA STRUCTURES

NON PRIMITIVE DATA SRUCTURE

NON PRIMITIVE DATA SRUCTURE

INTEGERINTEGER FLOATFLOAT POINTERPOINTER STRINGSSTRINGS LINEAR LINEAR NON LINEARNON LINEAR

STACKSTACK LISTLIST QUEUEQUEUEARRAYARRAY TREETREEGRAPHGRAPH

TABLEsPH

TABLEsPH

SETSET

Fig: 1.2


These are most sophisticated data structures which are derived from Primitive data structures .The non-primitive data structures emphasize on structuring of a group of homogeneous or heterogeneous data items.

Eg: Array, List, files, Stack, Queues etc.

The common operations done on data structure are categorized into four types:

1. CREATE – The create operations result in retrieving memory for the program elements which takes place either during compile time or run time.

2. DESTROY or DELETE – This option destroys the memory space allocated for a specified Data structures.

3. SELECTION – This Option deals with accessing a particular data within the data structure.

4. UPDATION – This modifies the data stored in the data structure.

The other options include,

a. Searching – The searching option finds the presence of a derived data item in a list of data items.

b. Sorting – is the process of arranging all data items in the data structure in a particular order, either in the descending order or ascending order.

c. Merging – is a process of combining the data items of two different data list into a single sorted list.

d. Traversing – refers to accessing or visiting through each element in a given list of items.

Non-primitive data structures are

ARRAYS Array is defined as a set of finite number of homogeneous elements or data items arranged in a linear order.

E.g.: int a [10]

Properties:

Every individual elements of an array can be accessed by specifying the name of array followed by index location.

The index of the first element is at location 0, That is, a[0] till a[n].

The elements are stored in consecutive memory locations.

The number of elements stored in an array is the size of the array represented b y ( Upper bound – Lower bound ) + 1

Arrays are read and written using ‘for’ loops.



For reading,

1. For (i=0;i<n;i++)

2.

3. scanf (“%d”,&a[i]);

4.

For Writing,

5. For (i=0; i<n; i++)

6.

7. printf(“%d”,a[i]);

8.

LIST A list or linear linked list is defined as variable number of data items .In dynamic representation of a list, every data item is represented in structures called ‘nodes’ where each node contains two fields. One field for storing ‘data ‘known as Information field and other for storing address of the node called as Pointer field.

Based on the type of linkages between the nodes there exists various types of node as,

Singly Linked list

Doubly Linked list

Circular linked list

Circular Doubly Liked list.

STACKS It is an ordered collection of linear elements, like an array where all insertions and deletions are done only at one end called TOP of the stack. Therefore, this data structure is also called LIFO structure (Last In First Out). Insertion of an element into the stack is called PUSH operation. Deletion of an element from the stack is referred as POP operation.


A B C D

Fig: 1.3


QUEUE A queue is a linear data structure where operations are done at both ends of the structure. It is of the type FIFO (First In First Out).

In the queue all insertions are done from the front end of the queue called FRONT, and all deletions are done at other end referred as REAR end.

TREE A tree is defined as a finite set of data items, which is represented using nodes. Tree is a non-linear type of data structure in which data items are arranged as sorted in a sorted sequence. A tree is represented as a hierarchal relationship between various elements in the set. The properties of a tree data structure are,

The special data item represented at the top of the tree is called root.

The remaining data items are partitioned into number of mutually exclusive or disjoint subsets and each subset represents a tree structure called sub tree. The left subset denotes left-sub tree and the right subset represents a right sub-tree.

The tree grows in length towards the bottom in the data structure.


TOPPUSH

POP

Fig: 1.4

45 57 80 90 23 77

FRONTREAR

Fig: 1.5

B C

A

ED F G

Right Sub treeLeft sub tree

Fig: 1.6


GRAPH

It is a mathematical, non-linear data structure capable of representing many kinds of physical structures, the graph is represented as G (V, E). Here, the graph G represents the set of all vertices denoted by V and the set of all edges denoted by E. Graph is classified into Directed, non-directed, Connected, non-Connected, simple and multi graphs.

ALGORITHM

Definition: An algorithm is a finite set of instructions to accomplish a particular task. All algorithms should have the following properties,

1. Input

2. Output

3. Definiteness: Each instruction is clear and ambiguous.

4. Finiteness: Algorithm should terminate after a number of steps.

5. Effectiveness: Every instruction must be basic enough to be carried out.

An algorithm can be distinguished from a program. A program may not be finite that is it may be non terminating. For e.g. operating system is a program that is waiting infinitely in a loop.

ALGORITHM SPECIFICATION

Pseudocode conventions

We describe an algorithm using a natural language like English, and the resulting instructions must be definite. Another possibility is Graphic representations called flowcharts, but it works only if the algorithm is small and simple. Pseudocode that resembles C also can be used.

Algorithm is defined as finite step by step list of well defined instructions for solving a problem. The algorithmic notation refers to the format of describing an Algorithm. This format of formal presentation of an Algorithm consists of two parts:

1) The first section tells the purpose of the Algorithm, identifies the variables which occur in the Algorithm and list of Input data.


A B

D C

G

Fig: 1.7


2) The second part of the Algorithm consists of list of steps that needs to be executed.

The Various conventions used in Algorithmic notation are:

# Comments begin with // and continue until the end of line.

# Blocks are indicated with matching braces: and.That is, a compound statement or a .

collection of simple statements can be represented as a block Statements are delimited by ‘;’

# Steps, Controls and Exit.

The steps of the Algorithm are executed one after the other beginning with step1 and executed downwards, ore control moves to the respective location depending on the step number, mentioned .i.e., The control can be transferred to any step n of the algorithm by the statement “Go to Step n” or such statements can be avoided by certain control structures.

An algorithm can obtain self contained modules and Three types of logic or flow of control called,

Sequence Logic or Sequential flow

Selection Logic or Conditional Flow.

Integration Logic or Repetitive Flow.

Sequence Logic

In selection logic all the modules are executed in sequence .the sequence may be referred explicitly by numbered steps or implicitly by the order in which modules are written.

Selection Logic

The selection logic employees a number of conditions which lead to a selection of one out of several alternative modules. The structures which implement this logic are called conditional structures or “If Structures”. The Statement ends with a structure statement as below.

[End of If structure]


Algorithm

Module A

Module B

Module C

Fig: 1.8


These conditional structures are of Three types:

1. Single Alternative : These structures are of the form :

If condition, then:

[Module A]


2. Double Alternative : It is of the form,

If condition, then:

[Module A]

Else :

[Module B]


3. Multiple Alternatives :

If condition (1), then:

[Module A1]

Else if condition (2), then:

[Module A2]

.

.

.

Else If Condition (M), then:

[Module Am]

Else :

[Module B]


Iterative Logic

The two types of structures involving loop are “For Loop” and “While Loop”. Each type begins with a “Repeat” statement and followed by a module, called the body of the loop. The end of structure is by the statement, [End of Loop]



1) Repeat For k = R to S by T:

[Module]

[End of Loop]

The Repeat – For loop uses an index variable as K, to Control the loop. The ‘R’ is called the initial values the end value or the test value, and T the increment.

2) The Repeat – while loop uses a condition to control the loop.

Repeat While condition:

[Module]

[End of Loop]

If several statements appear in the same step,

E.g. Set K = 1, LOC = 1 and MAX = DATA [1]

They are executed from left to right. The Algorithm is completed with the statement “Exit”.

3) Comments: Each step can contain comment in brackets which indicates the main purpose of the step. It usually appears at the beginning or at the end of the step.

# Variable names

Variable names will use capital letters as MAX, DATA etc. Single letter names of variables are used as counters or subscripts in capital letters (K and N) or lowercase letters (k and n).

# Assignment statement

The Assignment statement can use the formats of =, → ,: = etc.

<Variable>:= <expression>;

E.g.: MAX: = DATA [1], It assigns value of DATA [1] to MAX.

# Boolean values

There exist two Boolean values TRUE and FALSE. To produce this the logical operators and, or, and not and relational operators < ,<=,=,!= ,>= and > are used. Elements of multidimensional array are accessed using [ and ] .The (i,j)th element of the array is denoted as A[i,j],Array indices starts at zero.

# Input and Output

Data can be input and assigned to variables by means of a Read statement and output using Write statement.



Read: Variable names,

And print statement as,

Write: Messages or variable names.

# Procedures

The procedure is used for an independent algorithmic module to solve a particular problem.i.e, it is used to describe certain sub algorithms. They are completely independently defined algorithmic module which is used to invoke and call. It receives values, called arguments.

E.g.: NAME [PAR1, PAR2…………….PARK ]

# Every Algorithm is identified with a name and consists of heading and a body. The heading is of the form

Algorithm Name (<parameter list>)

Where ‘Name’ is the name of the procedure and ‘parameter list’ is a listing of the procedure parameters. The body consists of one or more simple or compound statements within and .An algorithm may or may not return values. Simple variables are passed by value and array or records are passed by reference.

Example1 (Algorithm 1.0): The following algorithm finds and returns the maximum of n given numbers:

1. Algorithm Max(A,n)

2. // A is an array of size n.

3.

4. Result := A[1];

5. for i:=2 to n do

6. if A[i] > Result then Result := A[i];

7. return Result;

8.

Recursive Algorithm

One of the efficient methods to make a program readable is to make it modular. That is, divide the program in to a number of functions. The view of the function is that it is invoked from one point and it is executed and returns the result to the invoking point. Recursion is the method by which a function calls itself repeatedly. An Algorithm is said to be recursive if it calls itself, or if the same algorithm is called in the body



Example (Algorithm 1.1): To find the factorial of a number

1. Int fact (n)

2.

3. If (n==0) then return 1

4. Else

5. Return n*fact (n-1);

6.

PERFORMANCE ANALYSIS AND MEASUREMENT

An Algorithm is defined as a sequence of steps which gives the method of solving a given problem and is the logic of the program. Performance analysis means we are analyzing the program before it is actually executed. In performance measurement, we are measuring the actual time taken to run the program by executing it in a system. Performance analysis of a program is done by analyzing the time and space requirements of a program because efficiency of an Algorithm depends on two criteria’s:

Run time

Space Requirement

COMPLEXITY OF ALGORITHMS

The complexity analysis of an algorithm is the measure of efficiency or performance in terms

of runtime and space.

Performance analysis: The algorithm can be judged based on certain criteria, for e.g.

a) Does it do what we want it to do?

b) Does it do according to the original specifications of the task?

c) Is there documentation that describes how to use it and how it works?

d) Are procedure created in such a way that they perform logical subfunctions?

e) Is the code readable?

Space complexity

A space complexity analysis analyzes, how much memory space does each program occupies to run to completion. Actually we are not calculating the actual space required for a program, we are just taking an estimate.

The space needed by each of the programs is the sum of the following components.



1. A fixed part that is independent of the characteristics (number and size of the inputs and

outputs, space for the program code, space for variables, constants etc.

2. A variable part that consists of the space needed for the variables that is dependent on the

particular problem instance being solved, the space needed by the referenced variables and

the recursion stack space.

Illustration: To illustrate how to find space complexity.

(Algorithm 1.3) Algorithm abc (a, b, c)

return a + b + b * c + (a + b- c)/ (a + b) +4.0;

The space needed by the algorithm is the sum of the following components:1. A fixed part that is independent of the characteristics (e.g., number, size) of inputs and

outputs. This basically includes the instruction space, space for simple variables and fixed-size component variables, space for constants etc.

2. A variable part that consists of the space needed by component variables whose size is dependent on the particular problem instance being solved, space needed by the reference variables, and recursion space etc.

The space requirement S(P), of a program P, thus is written as S(P)=c+Sp(Instance characteristics).Where ‘c’ consists of fixed part or a constant and Sp is instance characteristics. Since are taking only an estimate of the space complexity, we shall just consider only the Sp. Thus, in the above Algorithm, the problem instance is characterized by the specific values of a, b, and c.With assumption that one word is sufficient to store the values of each of a,b,c,and the result ,therefore the space needed by abc is independent of the instance characteristics ,Sp = 0.

Time complexity

The time taken by a program, T (P), is the sum of compile time and run time or the computer time it needs to run to completion. Compile time doesn’t depend on instance characteristics. Because a compiled program can be run many times without recompilation. So run-time only is considered to analyze time complexity. The time taken for a program depends on the time taken to execute each statement of the program and also the complexity of each of the statement is different. So time for a program depend on the number of additions, subtractions, multiplications, assignments etc.

That is, tp(n) = caADD(n) + csSUB(n)+cmMUL(n)+cdDIV(n)+………where n denotes the instance characteristics, and ca, cs, cm, cd, respectively ,denote the time needed for an addition, subtractions, multiplications, assignments etc.

For e.g. A=b+c

D=a+b*c/r+s+f

The time taken for the second statement is greater than the first. To evaluate the actual time by looking at the program is not possible, it is assumed that each meaningful step of the



program takes one unit of time to execute. So counting the number of steps in the program is done to find the total time taken.

If space of any two algorithms is considered to be constant, then the complexity depends on runtime. The runtime of an execution depends on the parameters such as,

ж Number of comparisons

ж Number of data exchanges.

ж Presence of recursive executions.

ж The Runtime is proportional to the size of the program.

ж The execution time increases as the Input data set increases in size.

Illustration: To illustrate how to find Time complexity. It is done by two methods as,

1. Step count method

2. Step per statement execution or frequency method

STEP COUNT METHOD

Here we introduce a variable ‘count’ in to the program. As each time a particular

meaningful statement is executed count increments by one.

For e.g.:( Algorithm 1.4)

1. Sum (int *a, int n)

2.

3. int i;

4. int a=0;

5. for(i= 0 to n-1)

6.

7. a[i]=i;

8.

9.

After introducing the variable ‘count’ in to the program as,

(Algorithm 1.5)

1. Sum(int *a, int n)

2.



3. int i;

4. int a=0;

a. count++; /* for the statement int a=0 */

5. for(i= 0 to n-1)

6.

a. count++; /* for for statement*/

7. a[i]=i;

a. count ++; /* for a[i]=i */

8.

a. count ++; /*for the last time of for statement */

9. return i;

a. count ++; /* for return */

10.

After the executing the above program we get the value 2n+3 in the variable count. We can see that number of steps in this program depends on the variable n.

STEP PER STATEMENT EXECUTION OR FREQUENCY METHOD (STEP TABLE

METHOD)

It determines how the computing time increases as the magnitude of the input increases. In

such case the step count (that is s/e, step per statement execution) of an algorithm can be

computed as a function of the magnitude if the input. In the Algorithm illustrated below Sum, the

time complexity is measured as a function of the number ‘n’ of element being added.

The complexity analysis defines three cases of step count based on runtime,

¡ Best case

¡ Worst case

¡ Average case

Best caseIf the algorithm finds the element at the first search itself, it is referred as a best case

algorithm.



Worst case

If the algorithm finds the element at the end of the search or if the searching of the element fails, the algorithm is in the worst case or it requires maximum number of steps that can be executed for the given parameters.

Average case

The analysis of average case behavior is complex than best case and worst case analysis, and is taken by the probability of Input data. As the volume of Input data increases, the average case algorithm behaves like worst case algoprithm.In an average case behavior; the searched element is located in between a position of the first and last element.

ASYMPTOTIC NOTATION ( O , Ω , )

The Asymptotic notation introduces some terminology that enables to make meaningful statements about the time and space complexities of an algorithm. The functions f and g are nonnegative functions.

O – NOTATION (Rate of Growth)

The O-notation (pronounced as Big “Oh”) is used to measure the performance of an algorithm which depends on the volume of Input data. The O-notation is used to define the order of growth of an algorithm, as the input size increases, the performance varies.

The function f (n) =O (g (n)) (read as f of n is big oh of g of n) iff there exists two positive constants c and n0 such that f (n) <=cg (n) for all n, where n>=n0.Suppose we have a program having count 3n+2. We can write it as f (n) =3n+2. We say that 3n+2=O (n), that is of the order of n, because f (n) <=4n or 3n + 2 <= 4n, for all n>=2.


Statement s/e frequency total time

Algorithm Sum(a,n) 0 - 0

0 - 0

s :=0.0; 1 1 1

for := 1 to n do 1 n+1 n+1

s := s + a[i]; 1 n n

return s; 1 1 1

0 - 0

Total 2n + 3

Table 1.3


Another e.g. Suppose f (n) =10n2+4n+2. We say that f (n) =O (n2) sine 10n2+4n+2<=11n2 for n>=2 But here we can’t say that f (n) =O (n) since 10 n2+ 4n+2 never less than or equal to cn,That is ,10n2+4n+2 != O (n) , But we are able to say that f (n) =O (n3) since f (n) can be less than or equal to cn3, same as 10n2+4n+2<=10n4, for n>=2.

Normally suppose a program has step count equals 5, and then we say that it has an order O (constant) or O (1).

If f (n) = 3n+5 then O (n) or O (n2) or O (n3) or ---But we cannot express its time complexity as O (1).

If f (n) = 5n2+8n+2 then O (n2) or O (n3) or ---But we cannot express its time complexity as O (n) or O (1).

If it is 6n3+3n+2 then O (n3) or O (n4) or ----But we cannot express its time complexity as O (n2) or O (n) or O (1).

If it is n log n+6n+9 then O (n log n) (to the base 2)If it is log n+6 then O (log n)

The seven various O-notations used are:

O (1) – We write O (1) to mean a computing time that is constant. Where the data item is searched in the first search itself.

O (n) – O(n) is called linear, where all the elements of the list are traversed and searched and the element is located at the nth location.

O (n2) – It is called quadratic, where all elements of the list are traversed for each element. E.g: worst case of bubble sort.

O (log n) –The O(log n) is sufficiently faster, for a large value of n, when the searching is done by dividing a list of items into 2 half’s and each time a half is traversed based on middle element.E.g: binary searching, binary tree traversal.

O (nlogn) – The O (nlogn) is better than O(n2 ),but not good as O(n),when the list is divided into 2-halfs and a half is traversed each time.E.g: Quick sort.

The other notations are O (n2), O (n3), O (2n) etc.

OMEGA NOTATION ( )

The function f (n) = (g(n)) (read as “ f of n is omega of g of n” ) iff there exists two positive constants c and n0 such that f(n) >= c * g(n) for all n, where n >= n0.

Suppose we have a program having count 3n+2. we can write it as f(n)=3n+2. And we say that 3n+2=(n), that is of the order of n, because f(n)>=3n or 3n+2>=3n, for all n>=1.

Normally suppose a program has step count equals 5, and then we say that it has an order (constant) or (1).



If f(n) = 3n+5 then (n) or (1)But we cannot express its time complexity as O(n2).

If f(n) = 5n2+8n+2 then (n2) or (n) or (1)But we cannot express its time complexity as O(n3) or O(n4).

If f(n) = 6n3+3n+2 then (n3) or (n2) or (n) or (1)But we cannot express its time complexity as O(n4)

If f(n) = n log n+6n+9 then (n log n) (to the base 2)If f(n) = log n+6 then (log n)

THETA NOTATION ()

The function, f(n) = (g(n)) (read as f of n is big oh of g of n) iff there exists two positive constants c1, c2 and n0 such that c1.g(n) >= f(n) <= c2.g(n) for all n, where n >= n0.

Suppose we have a program having count 3n+2. we can write it as f(n)=3n+2. We say that 3n+2= (n), that is of the order of n, because c1. n >= 3n + 2 <= c2. n .after a particular value of n.But we cannot say that 3n + 2 = (1) or (n2) or (n 3).

This can be explained by the following graph.

Normally suppose a program has step count equals 5, then we say that it has an order (constant) or (1).If f(n) = 3n+5 then (n) But we cannot express its time complexity as (n2) or (1).

If f(n) = 5n2+8n+2 then (n2).But we cannot express its time complexity as (n3) or (n4) or (n) or (1).

If f(n) = 6n3+3n+2 then (n3) But we cannot express its time complexity as O(n4) or (n2) or (n) or (1).

If f(n) = n log n+6n+9 then (n log n) (to the base 2)If f(n) = log n+6 then (log n)

From this we may conclude that is the most precise notation.For the binary search program we will get a time of the order log n. (to the base 2)

ARRAYS

Definition An array is defined as a finite, ordered collection of homogeneous data elements .the number of elements in the array is size of the array.

Array is also defined to be consecutive set of memory locations.The kind of data stored in the array is the data type of the array. Base of the array is the address of the memory locations where the first element in the array is located. Index or subscript value refers to the position of each element which is represented in square bracket followed by a array name.



The memory allocation for an array is calculated using the formula,

Address (A[i]) =M + (I - L) * W

Where M – Base address

I – Position whose memory address needs to be computed.

L – Lower bound Address

W – Word length.

ALGORITHMIC NOTATION OF AN ARRAY

(Algorithm 1.6)

Structure ARRAY (value, index)

Declare CREATE ( ) → array

RETRIEVE (array, index) → value

STORE (array, index, value) → array

For all A array, i, j index, x value

RETRIEVE (CREATE, i) :: error

RETRIEVE (STORE (A, i, x), j) :: =

IF EQUAL ( i , j ) then x else RETRIEVE ( A, j)

End

End ARRAY.

Explanation:

The function CREATE produces a new, empty array. RETRIEVE takes as input an array

and an index, and either returns the appropriate value or an error. STORE is used to enter new

index-value pairs. RETRIEVE (STORE (A,I,x),j) is explained as ‘to retrieve the j th item where x

has already been store at index i in A, which is equilavaent to checking if I and j are equal and if

so ,x,or is searched for the jth value in the remaining array, A.


45 57 80 90 23 77A [6]

Fig: 1.9


The various operations done on an array includes,

Insertion

Deletion

Sorting

Searching

An Array is classified as

1) 1-Dimensional Array.

2) 2-dimensional Array.

3) Multi-Dimensional array.

REPRESENTATION OF ARRAYS

1-Dimensional Array

A 1-d array is an array where all elements are referenced using only one subscript or index value. It is very important to see how arrays are represented in memory. Computer memory can be considered as one-dimensional array of locations from 1 to m. So we are going to see how the multi dimensional arrays are represented in memory. Suppose a one dimensional array declared as int a[20];The index of the array starts from 0 and ends at 19. The number of elements in the array is 20. So the number of elements in the array declared as int a[i] is i+1.

Suppose we have the one dimensional array declared as int a[5]. Then the elements in the array are a[0], a[1], a[2], a[3], a[4]. The address of the first location, a[0] is the base address ‘a’. Then address of the location a[1] is a+1. the address of a[4] is a+4. So if we have an element a[i], then the address of that element a[i] will be a+i. (Here we assume that one element occupies one location or one byte in memory. Actually an integer element occupies 2 locations or 2 bytes in memory and a character element occupies 1 location).

Operations on a 1-Dimensional ArrayInsertion into a 1-Dimensional Array(Algorithm 1.7)

Algorithm INSERT (LA, N, K, ITEM)

//LA is a Linear array with N elements and K is any Position with condition K<=N.The Algorithm inserts an element into kth position of LA //

Set j:=N

Repeat While j>=K

Set LA [j+1] := LA[j]

Set j: = j-1



End While

Set LA [K]:=ITEM //Insert ITEM //

Set N: = N+1 //Reset value of N //

Exit.

Deletion into a 1-Dimensional Array(Algorithm 1.8)

Algorithm DELETE (LA, N, K, ITEM)

//LA is a Linear array where N number of elements are present and K is the index variable, such that K<=N.The Algorithm deletes Kth element from LA //

Set ITEM: = LA [K]

Repeat for J=K to N-1

Set LA[j]:= LA [J+1 //move J+1 element upward//

End for

Set N=N-1 //Reset the number N of elements in LA //

Exit.

2-dimensional Array.

A 2-d array is a collection of homogeneous elements where elements are ordered into a number of rows and columns. Consider the case of a 2 dimensional array declared as int a[4][5]. We can represent this as a matrix consisting of 4 rows and 5 columns. We can say that each row of this matrix consists of 5 elements.

The two dimensional array is represented in the Figure: 1.10 given below.


0 1 2 3 4


The 2 dimensional array A[u1] [u2] may be interpreted as u1 rows, row0, row1, …, row

u1-1, each row consisting of u2 elements. Suppose @ is the address of A [0] [0], then the address

of A [i] [0] is @+i*u2, as there are i rows , each of size u2, preceding the first element in the i th

row. Knowing the address of A [i] [0], we can say that address of A [i] [j] is then simply as @ + i

* u2 + j.

E.g.: An m x n matrix where m number of rows, n number of Columns is present is a representation 2-d array.

The total number of elements in the above matrix =m*n.Here, in the matrix A each element is specified using 2 index variables i and j,That is, A [i][j]

A matrix can be represented in the memory using continuous memory locations .The 2 formats for storing a matrix in the memory are,

1) Row – major ordering

2) Column – major ordering


X X X X X

X X X X X

X X X X X

X X X X X

A =

1 2 3

4 5 6

7 8 9m =3, n= 3

Fig: 1.10

Fig: 1.11

Fig: 1.10

0

1

2

3


In the row – major ordering, the elements in a matrix, is sorted in a row by row manner. In a column major ordering, the elements are stored as column by column order.

Example

The 2-d Matrix can be illustrated using the Example in the Figure 1.11, of sparse Matrix.

Sparse matrix is as type of matrix stored in 2-d array where the value of the majority of elements is ‘zero’ or ‘null ‘.

E.g.:

In a sparse matrix, since there are few non-zero entries using. Using such 2-d array in the memory results in the disadvantages such as,

a) Wastage of memory space

b) Increases runtime in execution, and therefore decreases the efficiency.

In order to have an effective representation of a sparse matrix, it is stored in the memory using an alternative format known as 3-Tuple method.

Here, every non-zero entry is stored in a 2-d array with reference to its roe value and column value .ie,it is represented in the form as,(i,j,value).If the number non-zero entries represented in a sparse matrix is equal to ‘t’,then the row index ‘i’ will range from 0 to ‘t’ and the column index variable will range from 1 to 3 .i.e,represented as A(0 : t,1 : 3 ),which means ‘t’ non-zero items are stored into 3 various coloumns.This representation reduces memory wastage and reduces runtime. The type of operations done on a 2-d array includes searching,sorting,traversing,merging etc.Another one operation performed on a matrix is computing the transpose martrix,where A( i, j,value) is stored as B( j, i, value).

The matrix in the Figure 1.12 represents a 6 x 6 matrix of the form sparse. In order for the effective representation of this sparse matrix in the memory, it needs to be represented in the 3-tuple format. It requires 9 rows and 3 columns, As there are 8 non-zero elements, The first row represents the number for the representation of the row order(m),column order(n) ,and the rest 8 rows represent the nonzero value and its row index and column index. This matrix is told to be 3-tuple format, because, the number of columns has the row number, column number and the last column has the non-zero terms. That is, the entire 6 x 6, 36 entries in the memory, which actually


A =

0 2 0

4 5 0

0 0 0m =3, n= 3

Fig: 1.12


contains only 8 values, are reduced into the form with 9 columns and 8 entries. The 3-tuple matrix is represented in the Figure: 1.13.

The Transpose of the above matrix A can be computed using the procedure TRANSPOSE (A, B), where A is the original matrix in 3-tuple form and B is the Transpose of A as represented in Figure: 1.14.

(Algorithm 1.9)

Procedure TRANSPOSE (A, B)

// A is a matrix represented in sparse form and B is said to be its transpose.

( m, n ,t) ← ( A(0,1) ,A(0,2) ,A(0,3) )

( B ( 0,1) , B( 0,2) , B( 0,3) ) ← ( n, m, t )

If t <= 0 then return //check for zero elements

q← 1 //q is the position of next term in B

for col←1 to n do //transpose by column

for p←1 to t do //for all non-zero terms do

if A(p ,2)= col //correct column

then [ (B(q,1) B(q,2) B(q,3) ) ← ( A(p,2) , A(p,1), A(p,3) ) //insert next term of B


Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Col 1 Col 2 Col 3 Col 4 Col 5 Col 6

15 0 0 22 0 -15

0 11 3 0 0 0

0 0 0 -6 0 0

0 0 0 0 0 0

91 0 0 0 0 0

0 0 28 0 0 0

Fig: 1.13


q←q+1 ]

End

End.

End TRANSPOSE.

The transpose Matrix B of matrix A is represented below.


A (0

1 2 3

A (1

A (2

A (3

A (4

A (5

A (6

6 6 8

1 1 15

1 4 22

1 6 -15

2` 2 11

2 3 3

3 4 -6

5 1 91

6 3 28

A (7

A (8

Fig: 1.14


Three-dimensional Array

The 3-D array assumes 3 dimensions to refer to a value stored.ie, the number of rows,

number of columns, and number of pages. The element in a ijk assumes ith row, jth column, kth page

as illustrated in figure:1.15. Consider a three dimensional array A [u1] [u2] [u3]. The array is

interpreted as u1 two dimensional arrays of dimension u2 x u3. to locate A[I] [j] [k], we first

obtain @ + I u2 u3 + j u3 + k as the address of A [I] [j] [k].

Generalizing this, the addressing formula for any element A [i1] [i2] ….[in] in an n dimensional

array declared as A [u1] [u2] ….[un] may be easily obtained. If @ is the address for A [i] [0] …

[0] then @ + i1 u2 u3….un is the address for A [i1] [0] …[0]. The address for A [i1] [i2] [0] ….

[0] is then @ + i1 u2 u3…un + i2 u3 u4 … un.

Repeating this way, address for A [i1] [i2] … [in] is

@ + i1 u2 u3 …un

+ i2 u3 u4 … un

+ i3 u4 u5 … un

…

…

…

+ in-1 un

+in


B (0

1 2 3

B (1

B (3

B (4

B (5

B (6

6 6 8

1 1 15

1 4 22

1 6 -15

2` 2 11

2 3 3

3 4 -6

5 1 91

6 3 28

B (7

A (8Fig: 1.15


ORDERED LISTS

Definition: An ordered list is told to be a collection of related set of elements or a set of data objects also called linear list. Examples are the days of a week forms a ordered list as (MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY,SATURDAY,SUNDAY) etc or floors of a building as (basement,lobby,first,second,third) etc.The ordered list is either empty or can be represented as (a1,a2,a3,………..an)where ai are atoms from the set S.There are a variety of operations that can be performed on the list, they include,

a) To find the length of the list ,n

b) To read the list from left to right and from right to left.

c) To retrieve the ith element,1<=i<=n

d) To store a new value into the ith position,1<=i<=n

e) To insert a new value in the ith position,1<=i<=n+1,making the rest the elements shifted to the next positions.

f) To delete the element in position i,1<=i<=n causing the elements numbered in i+1,…n shifted to the previous locations as n-1.

The most common way to represent the ordered list is using an array, where the list element ai is at location i of the array index. The ordered list can be sequentially mapped by storing the ai and ai+1 into consecutive locations of i and i+1 of the array. This makes the users capable to retrieve or modify the values in a random order in constant amount of time.

Ordered list processing

The problem requiring to process the ordered list can be done using the 1-d arrays. The classical example for the List processing technique is the manipulation of symbolic polynomials. This refers to the 4 basic operations as Addition, subtraction, multiplication and division that could be done on 2 polynomials. Polynomials are told to be symbolic as it contains the list of coefficients and exponents in any given polynomial.

A polynomial is defined to be a sum of terms where each term is of the form axe ,here ‘x’ is a variable,’ a’ is the coefficient and ‘e’ is the exponent.

A general polynomial A(x) is of the form,


K

J

i

Fig: 1.16


Anxn+an-1xn-1+…+a1x +a0 where an ≠0,the degree of A is n.To represent A(x) an ordered list of coefficients using a one dimensional array of length n+2 is used.

Example: Consider two polynomials A(x) = 3x2 +2x + 4 and B(x) =x4 +10x3 +3x2 +1

In the array representation of the polynomial it uses 2m+1 locations, where m is the number of terms in the polynomial and the first entry in the array represents the total number of terms.

The resultant polynomial C contains 2(m+n)+1) locations.

The addition of the 2 polynomials can be represented using the procedure PADD.(Algorithm 1.10)

Procedure PADD (A, B, C)

// A (1:2m+1), B (1:2n+1), C (1:2(m+n) +1) //

m ← A(1);

n ← B(1);

p ← q ← r ← 2;

While p<=2m and q<=2n do

Case // Compare exponents //

: A (p) = B (q) : C (r+1) ← A (p+1) + B (q+1) //add coefficients //

if C (r+1) ≠ 0

then [C(r) ←A (p); r ← r + 2] //Store exponent //

p ← p + 2 ; q ← q + 2 //advance to next terms //

: A (p) < B (q) : C (r+1) ← B (q+1); C(r) ←B(q) // Store new term //


3 2 3 1 2 0 4A [X]

4 4 1 3 10 2 3 0 1B[X]

Fig: 1.17

7 4 1 3 10 2 6 1 2 0 5

Fig: 1.18


q ← q + 2; r ← r + 2 //advance to next terms //

: A (p) > B (q) : C (r+1) ← A (p+1) ;C(r) ←A(p) // Store new term //

p ← p + 2; r ← r + 2 //advance to next terms //

End

End

While p <= 2m do //Copy remaining terms of A//

C( r) ←A (p);

C(r+1) ←A (p+1)

p ←p+2; r← r+2;

End

While q<=2n do //Copy remaining terms of B//

C( r) ←B (q);

C(r+1) ←B (q+1)

q ←q+2; r← r+2;

End

C (1) ← r/2 -1 //Number of terms in the sum //

End PADD

Explanation

The procedure has the Capital letter parameters which are array names. The 3 pointers (p, q, r) are used to designate a term in A, B or C.The basic iteration is governed by a While loop and blocks of statements are grouped together using square brackets. Variables ‘m’ and ‘n’ refers to the nonzero terms in A and B respectively.


Module 1

Documents

organization of data

term data

overview data

structuring of data

processed data

data data type38 integers

data fig

transformation of data