Data structures and algorithms

Data Structures and Algorithms

Abstract data typeA theoretical description of an algorithm, if realized in application is affected very much by: computer resources, implementation, data. Such a theory include fundamental concepts:Abstract Data Type (ADT) or data type, or data structures tools to express operations of algorithms; computational resources to implement the algorithm and test its functionality; evaluation of the complexity of algorithms.

What is a Data Type?A name for the INTEGER data typeE.g., intCollection of (possible) data itemsE.g., integers can have values in the range of -231 to 231 1Associated set of operations on those data itemsE.g., arithmetic operations like +, -, *, /, etc.

Abstract data typeAn abstract data type (ADT) is defined as a mathematical model of the data objects that make up a data type, as well as the functions that operate on these objects (and logical or other relations between objects). ADT consist of two parts: data objects and operations with data objects. The term data type refers to the implementation of the mathematical model specified by an ADTThe term data structure refers to a collection of computer variables that are connected in some specific mannerThe notion of data type include basic data types. Basic data types are related to a programming language.

Example: Integer Set Data TypeABSTRACT (Theoretical) DATA TYPEMathematical set of integers, IPossible data items: -inf, , -1, 0, 1, , +infOperations: +, -, *, mod, div, etc.Actual, Implemented Data Type (available in C++):Its called intOnly has range of -231 to 231 1 for the possible data items (instead of inf to +inf)Has same arithmetic operations availableWhats the relationship/difference between the ADT and the Actual, Implemented Data Type in C++?The range of possible data items is different.

The THREE EssentialsABSTRACT (Theoretical) DATA TYPEE.g., the mathematical class I in our exampleActual IMPLEMENTED DATA TYPEWhat you have in C++; for example, int in our exampleINSTANTIATED DATA TYPE DATA STRUCTUREE.g., x in int x = 5; (in our example)Stores (or structures) the data item(s)Can be a variable, array, object, etc.; holds the actual data (e.g., a specific value)ADT Class DECLARATION (lib.h)Class DEFINITION (lib.cpp)Object (project.cpp)

Implementation of an ADT The data structures used in implementations are EITHER already provided in a programming language (primitive or built-in) or are built from the language constructs (user-defined).

In either case, successful software design uses data abstraction:

Separating the declaration of a data type from its implementation.

6

Summary of ADT(Abstract or Actual) Data Types have three properties:NamePossible Data ItemsOperations on those data itemsThe Data Type declaration goes in the .h (header) file e.g., the class declarationThe Data Type definitions go in the .cpp (implementation) file e.g., the class definition

StacksStacks are a special form of collectionwith LIFO semanticsTwo methodsint push( Stack s, void *item ); - add item to the top of the stackvoid *pop( Stack s ); - remove an item from the top of the stackLike a plate stackerother methodsint IsEmpty( Stack s );/* Return TRUE if empty */void *Top( Stack s );/* Return the item at the top, without deleting it */

StacksThis ADT covers a set of objects as well as operations performed on these objects: Initialize (S) creates a necessary structured space in computer memory to locate objects in S; Push(x) inserts x into S; Pop deletes object from the stack that was most recently inserted into; Top returns an object from the stack that was most recently inserted into; Kill (S) - releases an amount of memory occupied by S.The operations with stack objects obey LIFO property: Last-In-First-Out. This is a logical constrain or logical condition. The operations Initialize and Kill are more oriented to an implementation of this ADT, but they are important in some algorithms and applications too. The stack is a dynamic data set with a limited access to objects.

Stacks - ImplementationArraysProvide a stack capacity to the constructorFlexibility limited but matches many real usesCapacity limited by some constraintMemory in your computerSize of the plate stacker, etcpush, pop methods Variants of AddToC, DeleteFromCLinked list also possibleStack: basically a Collection with special semantics!

11Array Stack ImplementationWe can use an array of elements as a stackThe top is the index of the next available element in the arraytopintegerObject of type TObject of type Tnull

T [ ] stack

12Linked Stack ImplementationWe can use the same LinearNode class that we used for LinkedSet implementationWe change the attribute name to top to have a meaning consistent with a stackObject of type T

top

LinearNode next;T element;Object of type T

LinearNode next;T element;

nullcountinteger

The N-Queens ProblemSuppose you have 8 chess queens......and a chess board

We'll start with a description of a problem which involves a bunch of queens from a chess game, and a chess board.

The N-Queens ProblemCan the queens be placed on the board so that no two queens are attacking each other

?

Some of you may have seen this problem before. The goal is to place all the queens on the board so that none of the queens are attacking each other.

The N-Queens ProblemTwo queens are not allowed in the same row...

If you play chess, then you know that this forbids two queens from being in the same row...

The N-Queens ProblemTwo queens are not allowed in the same row, or in the same column...

...or in the same column...

The N-Queens ProblemTwo queens are not allowed in the same row, or in the same column, or along the same diagonal.

...or along the same diagonal.

As a quick survey, how many of you think that a solution will be possible? In any case, we shall find out, because we will write a program to try to find a solution.

As an aside, if the program does discover a solution, we can easily check that the solution is correct. But suppose the program tells us that there is no solution. In that case, there are actually two possibilies to keep in mind:1. Maybe the problem has no solution.2. Maybe the problem does have a solution, and the program has a bug!Moral of the story: Always create an independent test to increase the confidence in the correctness of your programs.

The N-Queens ProblemThe number of queens, and the size of the board can vary.

N QueensN rowsN columns

The program that we write will actually permit a varying number of queens. The number of queens must always equal the size of the chess board. For example, if I have six queens, then the board will be a six by six chess board.

The N-Queens ProblemWe will write a program which tries to find a way to place N queens on an N x N chess board.

At this point, I can give you a demonstration of the program at work. The demonstration uses graphics to display the progress of the program as it searches for a solution.

During the demonstration, a student can provide the value of N. With N less than 4, the program is rather boring. But N=4 provides some interest. N=10 takes a few minutes, but it is interesting to watch and the students can try to figure out the algorithm used by the program.

How the program worksThe program uses a stack to keep track of where each queen is placed.

I want to show you the algorithm that the program uses. The technique is called backtracking. The key feature is that a stack is used to keep track of each placement of a queen.

How the program worksEach time the program decides to place a queen on the board, the position of the new queen is stored in a record which is placed in the stack.

ROW 1, COL 1

For example, when we place the first queen in the first column of the first row, we record this placement by pushing a record onto the stack. This record contains both the row and column number of the newly-placed queen.

How the program worksWe also have an integer variable to keep track of how many rows have been filled so far.

ROW 1, COL 1

1filled

In addition to the stack, we also keep track of one other item: an integer which tells us how many rows currently have a queen placed.

How the program worksEach time we try to place a new queen in the next row, we start by placing the queen in the first column...

ROW 1, COL 1

1filled

ROW 2, COL 1

When we successfully place a queen in one row, we move to the next row. We always start by trying to place the queen in the first column of the new row.

How the program works...if there is a conflict with another queen, then we shift the new queen to the next column.

ROW 1, COL 1

1filled

ROW 2, COL 2

But each new placement must be checked for potential conflicts with the previous queen. If there is a conflict, then the newly-placed queen is shifted rightward.

How the program worksIf another conflict occurs, the queen is shifted rightward again.

ROW 1, COL 1

1filled

ROW 2, COL 3

Sometimes another conflict will occur, and the newly-placed queen must continue shifting rightward.

How the program worksWhen there are no conflicts, we stop and add one to the value of filled.

ROW 1, COL 1

2filled

ROW 2, COL 3

When the new queen reaches a spot with no conflicts, then the algorithm can move on. In order to move on, we add one to the value of filled...

How the program worksLet's look at the third row. The first position we try has a conflict...

ROW 1, COL 1

2filled

ROW 2, COL 3

ROW 3, COL 1

...and place a new queen in the first column of the next row.

How the program works...so we shift to column 2. But another conflict arises...

ROW 1, COL 1

2filled

ROW 2, COL 3

ROW 3, COL 2

In this example, there is a conflict with the placement of the new queen, so we move her rightward to the second column.

How the program works...and we shift to the third column.Yet another conflict arises...

ROW 1, COL 1

2filled

ROW 2, COL 3

ROW 3, COL 3

Another conflict arises, so we move rightward to the third column.

How the program works...and we shift to column 4. There's still a conflict in column 4, so we try to shift rightward again...

ROW 1, COL 1

2filled

ROW 2, COL 3

ROW 3, COL 4

Yet another conflict arises, so we move to the fourth column. The key idea is that each time we try a particular location for the new queen, we need to check whether the new location causes conflicts with our previous queens. If so, then we move the new queen to the next possible location.

How the program works...but there's nowhere else to go.

ROW 1, COL 1

2filled

ROW 2, COL 3

ROW 3, COL 4

Sometimes we run out of possible locations for the new queens. This is where backtracking comes into play.

How the program worksWhen we run out of room in a row:pop the stack,reduce filled by 1and continue working on the previous row.

ROW 1, COL 1

1filled

ROW 2, COL 3

To backtrack, we throw out the new queen altogether, popping the stack, reducing filled by 1, and returning to the previous row. At the previous row, we continue shifting the queen rightward.

How the program worksNow we continue working on row 2, shifting the queen to the right.

ROW 1, COL 1

1filled

ROW 2, COL 4

Notice that we continue the previous row from the spot where we left off. The queen shifts from column 3 to column 4. We don't return her back to column 1.

It is the use of the stack that lets us easily continue where we left off. The position of this previous queen is recorded in the stack, so we can just move the queen rightward one more position.

How the program worksThis position has no conflicts, so we can increase filled by 1, and move to row 3.

ROW 1, COL 1

2filled

ROW 2, COL 4

The new position for row 2 has no conflicts, so we can increase filled by 1, and move again to row 3.

How the program worksIn row 3, we start again at the first column.

ROW 1, COL 1

2filled

ROW 2, COL 4

ROW 3, COL 1

At the new row, we again start at the first column. So the general rules are:

When the algorithm moves forward, it always starts with the first column.

But when the algorithm backtracks, it continues whereever it left off.

Pseudocode for N-QueensInitialize a stack where we can keep track of our decisions.Place the first queen, pushing its position onto the stack and setting filled to 0.repeat these stepsif there are no conflicts with the queens...else if there is a conflict and there is room to shift the current queen rightward...else if there is a conflict and there is no room to shift the current queen rightward...

Heres the pseudocode for implementing the backtrack algorithm. The stack is initialized as an empty stack, and then we place the first queen.

After the initialization, we enter a loop with three possible actions at each iteration. We'll look at each action in detail...

Pseudocode for N-Queensrepeat these stepsif there are no conflicts with the queens...

Increase filled by 1. If filled is now N, thenthe algorithm is done. Otherwise, move tothe next row and place a queen in thefirst column.

The nicest possibility is when none of the queens have any conflicts. In this case, we can increase filled by 1.

If filled is now N, then we are done!

But if filled is still less than N, then we can move to the next row and place a queen in the first column. When this new queen is placed, we'll record its position in the stack.

Another aside: How do you suppose the program "checks for conflicts"?Hint: It helps if the stack is implemented in a way that permits the program to peek inside and see all of the recorded positions. This "peek inside" operation is often implemented with a stack, although the ability to actually change entries is limited to the usual pushing and popping.

Pseudocode for N-Queensrepeat these stepsif there are no conflicts with the queens...else if there is a conflict and there is room to shift the current queen rightward...

Move the current queen rightward,adjusting the record on top of the stackto indicate the new position.

The second possiblity is that a conflict arises, and the new queen has room to move rightward. In this case, we just move the new queen to the right.

Pseudocode for N-Queensrepeat these stepsif there are no conflicts with the queens...else if there is a conflict and there is room to shift the current queen rightward...else if there is a conflict and there is no room to shift the current queen rightward...

Backtrack!Keep popping the stack, and reducing filled by 1, until you reach a row where the queen can be shifted rightward. Shift this queen right.

The last possiblity is that a conflict exists, but the new queen has run out of room. In this case we backtrack:

Pop the stack,Reduce filled by 1.

We must keep doing these two steps until we find a row where the queen can be shifted rightward. In other words, until we find a row where the queen is not already at the end.

At that point, we shift the queen rightward, and continue the loop.

But there is one potential pitfall here!

Pseudocode for N-Queensrepeat these stepsif there are no conflicts with the queens...else if there is a conflict and there is room to shift the current queen rightward...else if there is a conflict and there is no room to shift the current queen rightward...

Backtrack!Keep popping the stack, and reducing filled by 1, until you reach a row where the queen can be shifted rightward. Shift this queen right.

The potential pitfall: Maybe the stack becomes empty during this popping. What would that indicate?

Answer: It means that we backtracked right back to the beginning, and ran out of possible places to place the first queen. In that case, the problem has no solution.

Stacks - RelevanceStacks appear in computer programsKey to call / return in functions & proceduresStack frame allows recursive callsCall: push stack frameReturn: pop stack frameStack frameFunction argumentsReturn addressLocal variables

Stacks have many applications.The application which we have shown is called backtracking.The key to backtracking: Each choice is recorded in a stack.When you run out of choices for the current decision, you pop the stack, and continue trying different choices for the previous decision.

Summary

A quick summary . . .

43Stacks and QueuesArray Stack ImplementationLinked Stack ImplementationQueue Abstract Data Type (ADT)Queue ADT InterfaceQueue Design Considerations

44Queue Abstract Data TypeA queue is a linear collection where the elements are added to one end and removed from the other endThe processing is first in, first out (FIFO)The first element put on the queue is the first element removed from the queueThink of a line of people waiting for a bus (The British call that queuing up)

45A Conceptual View of a Queue

Rear of Queue(or Tail)

Adding an ElementRemoving an Element

Front of Queue(or Head)

46Queue TerminologyWe enqueue an element on a queue to add oneWe dequeue an element off a queue to remove oneWe can also examine the first element without removing itWe can determine if a queue is empty or not and how many elements it contains (its size)The L&C QueueADT interface supports the above operations and some typical class operations such as toString()

47Queue Design ConsiderationsAlthough a queue can be empty, there is no concept for it being full. An implementation must be designed to manage storage spaceFor first and dequeue operation on an empty queue, this implementation will throw an exceptionOther implementations could return a value null that is equivalent to nothing to return

48Queue Design ConsiderationsNo iterator method is providedThat would be inconsistent with restricting access to the first element of the queueIf we need an iterator or other mechanism to access the elements in the middle or at the end of the collection, then a queue is not the appropriate data structure to use

QueuesThis ADT covers a set of objects as well as operations performed on objects: queueinit (Q) creates a necessary structured space in computer memory to locate objects in Q; put (x) inserts x into Q; get deletes object from the queue that has been residing in Q the longest; head returns an object from the queue that has been residing in Q the longest; kill (Q) releases an amount of memory occupied by Q. The operations with queue obey FIFO property: First-In-First-Out. This is a logical constrain or logical condition. The queue is a dynamic data set with a limited access to objects. The application to illustrate usage of a queue is: queueing system simulation (system with waiting lines) (implemented by using the built-in type of pointer)

Queue implementationJust as with stacks, queues can be implemented using arrays or lists. For the first of all, lets consider the implementation using arrays. Define an array for storing the queue elements, and two markers: one pointing to the location of the head of the queue, the other to the first empty space following the tail. When an item is to be added to the queue, a test to see if the tail marker points to a valid location is made, then the item is added to the queue and the tail marker is incremented by 1. When an item is to be removed from the queue, a test is made to see if the queue is empty and, if not, the item at the location pointed to by the head marker is retrieved and the head marker is incremented by 1.

Queue implementationThis procedure works well until the first time when the tail marker reaches the end of the array. If some removals have occurred during this time, there will be empty space at the beginning of the array. However, because the tail marker points to the end of the array, the queue is thought to be 'full' and no more data can be added. We could shift the data so that the head of the queue returns to the beginning of the array each time this happens, but shifting data is costly in terms of computer time, especially if the data being stored in the array consist of large data objects.

Queue implementation

We may now formalize the algorithms for dealing with queues in a circular array. Creating an empty queue: Set Head = Tail = 0. Testing if a queue is empty: is Head == Tail? Testing if a queue is full: is (Tail + 1) mod QSIZE == Head? Adding an item to a queue: if queue is not full, add item at location Tail and set Tail = (Tail +1) mod QSIZE. Removing an item from a queue: if queue is not empty, remove item from location Head andset Head = (Head + 1) mod QSIZE.

Linked list

Linked list

Linked listThe List ADT A list is one of the most fundamental data structures used to store a collection of data items. The importance of the List ADT is that it can be used to implement a wide variety of other ADTs. That is, the LIST ADT often serves as a basic building block in the construction of more complicated ADTs. A list may be defined as a dynamic ordered n-tuple: L == (l1, 12, ....., ln)

Linked listThe use of the term dynamic in this definition is meant to emphasize that the elements in this n-tuple may change over time.Notice that these elements have a linear order that is based upon their position in the list. The first element in the list, 11, is called the head of the list.The last element, ln, is referred to as the tail of the list.The number of elements in a list L is refered to as the length of the list. Thus the empty list, represented by (), has length 0. A list can homogeneous or heterogeneous.

Linked list0. Initialize ( L ). This operation is needed to allocate the amount of memory and to give a structure to this amount. 1. Insert (L, x, i). If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. 2. Append (L, x). Adds element x to the tail of L, causing the length of the list to become n+1. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. 3. Retrieve (L, i). Returns the element stored at position i of L, or the null value if position i does not exist. 4. Delete (L, i). Deletes the element stored at position i of L, causing elements to move in their positions. 5. Length (L). Returns the length of L.

Linked list6. Reset (L). Resets the current position in L to the head (i.e., to position 1) and returns the value 1. If the list is empty, the value 0 is returned. 7. Current (L). Returns the current position in L. 8. Next (L). Increments and returns the current position in L. Note that only the Insert, Delete, Reset, and Next operations modify the lists to which they are applied. The remaining operations simply query lists in order to obtain information about them.

Linked listsFlexible space useDynamically allocate space for each element as neededInclude a pointer to the next itemLinked listEach node of the list containsthe data item (an object pointer in our ADT)a pointer to the next node

DataNext

object

Linked listsCollection structure has a pointer to the list headInitially NULLAdd first itemAllocate space for nodeSet its data pointer to objectSet Next to NULLSet Head to point to new node

DataNext

object

Head

Collectionnode

Linked listsAdd second itemAllocate space for nodeSet its data pointer to objectSet Next to current HeadSet Head to point to new node

DataNext

object

Head

Collectionnode

DataNext

object2node

Linked lists - Add implementationImplementationstruct t_node { void *item; struct t_node *next; } node;typedef struct t_node *Node;struct collection { Node head; };int AddToCollection( Collection c, void *item ) { Node new = malloc( sizeof( struct t_node ) ); new->item = item; new->next = c->head; c->head = new; return TRUE; }

Linked lists - Add implementationImplementationstruct t_node { void *item; struct t_node *next; } node;typedef struct t_node *Node;struct collection { Node head; };int AddToCollection( Collection c, void *item ) { Node new = malloc( sizeof( struct t_node ) ); new->item = item; new->next = c->head; c->head = new; return TRUE; } Recursive type definition -C allows it!

Error checking, assertsomitted for clarity!

Linked listsAdd timeConstant - independent of nSearch timeWorst case - n

DataNext

object

Head

Collectionnode

DataNext

object2node

Linked lists DeleteImplementation

void *DeleteFromCollection( Collection c, void *key ) { Node n, prev; n = prev = c->head; while ( n != NULL ) {if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {prev->next = n->next;return n; } prev = n; n = n->next; } return NULL; }

head

Linked lists DeleteImplementation

void *DeleteFromCollection( Collection c, void *key ) { Node n, prev; n = prev = c->head; while ( n != NULL ) {if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {prev->next = n->next;return n; } prev = n; n = n->next; } return NULL; }

head

Minor addition needed to allowfor deleting this one! An exercise!

Linked lists - LIFO and FIFOSimplest implementationAdd to headLast-In-First-Out (LIFO) semanticsModificationsFirst-In-First-Out (FIFO)Keep a tail pointerstruct t_node { void *item; struct t_node *next; } node;typedef struct t_node *Node;struct collection { Node head, tail; };tail is set inthe AddToCollectionmethod ifhead == NULL

head

tail

Dynamic set ADTThe concept of a set serves as the basis for a wide variety of useful abstract data types. A large number of computer applications involve the manipulation of sets of data elements. Thus, it makes sense to investigate data structures and algorithms that support efficient implementation of various operations on sets. Another important difference between the mathematical concept of a set and the sets considered in computer science: a set in mathematics is unchanging, while the sets in CS are considered to change over time as data elements are added or deleted. Thus, sets are refered here as dynamic sets. In addition, we will assume that each element in a dynamic set contains an identifying field called a key, and that a total ordering relationship exists on these keys. It will be assumed that no two elements of a dynamic set contain the same key.

Dynamic set ADTThe concept of a dynamic set as an DYNAMIC SET ADT is to be specified, that is, as a collection of data elements, along with the legal operations defined on these data elements. If the DYNAMIC SET ADT is implemented properly, application programmers will be able to use dynamic sets without having to understand their implementation details. The use of ADTs in this manner simplifies design and development, and promotes reusability of software components. A list of general operations for the DYNAMIC SET ADT. In each of these operations, S represents a specific dynamic set:

Dynamic set ADTSearch(S, k). Returns the element with key k in S, or the null value if an element with key k is not in S. Insert(S, x). Adds element x to S. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. Delete(S, k). Removes the element with key k in S. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. Minimum(S). Returns the element in dynamic set S that has the smallest key value, or the null value if S is empty. Maximum(S). Returns the element in S that has the largest key value, or the null value if S is empty. Predecessor(S, k). Returns the element in S that has the largest key value less than k, or the null value if no such element exists. Successor(S, k). Returns the element in S that has the smallest key value greater than k, or the null value if no such element exists.

Dynamic set ADTIn many instances an application will only require the use of a few DYNAMIC SET operations. Some groups of these operations are used so frequently that they are given special names:the ADT that supports Search, Insert, and Delete operations is called the DICTIONARY ADT; the STACK, QUEUE, and PRIORITY QUEUE ADTs are all special types of dynamic sets. A variety of data structures will be described in forthcoming considerations that they can be used to implement either the DYNAMIC SET ADT, or ADTs that support specific subsets of the DYNAMIC SET ADT operations. Each of the data structures described will be analyzed in order to determine how efficiently they support the implementation of these operations. In each case, the analysis will be performed in terms of n, the number of data elements stored in the dynamic set.

Generalized queueStacks and FIFO queues are identifying items according to the time that they were inserted into the queue. Alternatively, the abstract concepts may be identified in terms of a sequential listing of the items in order, and refer to the basic operations of inserting and deleting items from the beginning and the end of the list: if we insert at the end and delete at the end, we get a stack (precisely as in array implementation); if we insert at the beginning and delete at the beginning, we also get a stack (precisely as in linked-list implementation); if we insert at the end and delete at the beginning, we get a FIFO queue (precisely as in linked-list implementation); if we insert at the beginning and delete at the end, we also get a FIFO queue (this option does not correspond to any of implementations given).

Generalized queueSpecifically, pushdown stacks and FIFO queues are special instances of a more general ADT: the generalized queue. Instances generalized queues differ in only the rule used when items are removed: for stacks, the rule is "remove the item that was most recently inserted"; for FIFO queues, the rule is "remove the item that was least recently inserted"; there are many other possibilities to consider. A powerful alternative is the random queue, which uses the rule: "remove a random item"

Generalized queueThe algorithm can expect to get any of the items on the queue with equal probability. The operations of a random queue can be implemented: in constant time using an array representation (it requires to reserve space ahead of time) using linked-list alternative (which is less attractive however, because implementing both, insertion and deletion efficiently is a challenging task)Random queues can be used as the basis for randomized algorithms, to avoid, with high probability, worst-case performance scenarios.

Generalized queueBuilding on this point of view, the dequeue ADT may be defined, where either insertion or deletion at either end are allowed. The implementation of dequeue is a good exercise to program. The priority queue ADT is another example of generalized queue. The items in a priority queue have keys and the rule for deletion is: "remove the item with the smallest key" The priority queue ADT is useful in a variety of applications, and the problem of finding efficient implementations for this ADT has been a research goal in computer science for many years.

Heaps and the heapsortHeaps and priority queuesHeap structure and position numberingHeap structure propertyHeap ordering propertyRemoval of top priority nodeInserting a new node into the heapThe heap sortSource code for heap sort program

Heaps and priority queuesA heap is a data structure used to implement an efficient priority queue. The idea is to make it efficient to extract the element with the highest priority the next item in the queue to be processed.We could use a sorted linked list, with O(1) operations to remove the highest priority node and O(N) to insert a node. Using a tree structure will involve both operations being O(log2N) which is faster.

Heap structure and position numbering 1A heap can be visualised as a binary tree in which every layer is filled from the left. For every layer to be full, the tree would have to have a size exactly equal to 2n1, e.g. a value for size in the series 1, 3, 7, 15, 31, 63, 127, 255 etc.

So to be practical enough to allow for any particular size, a heap has every layer filled except for the bottom layer which is filled from the left.

Heap structure and position numbering 2

Heap structure and position numbering 3

In the above diagram nodes are labelled based on position, and not their contents. Also note that the left child of each node is numbered node*2 and the right child is numbered node*2+1. The parent of every node is obtained using integer division (throwing away the remainder) so that for a node i's parent has position i/2 .

Because this numbering system makes it very easy to move between nodes and their children or parents, a heap is commonly implemented as an array with element 0 unused.

Heap PropertiesA heap T storing n keys has height h = log(n + 1), which is O(log n)

Heap ordering

Heap InsertionInsert 6

Heap InsertionAdd key in next available position

Heap InsertionBegin Unheap

Heap Insertion

Heap InsertionTerminate unheap whenreach rootkey child is greater than key parent

Removal of top priority nodeThe rest of these notes assume a min heap will be used. Removal of the top node creates a hole at the top which is "bubbled" downwards by moving values below it upwards, until the hole is in a position where it can be replaced with the rightmost node from the bottom layer. This process restores the heap ordering property.

Heap RemovalRemove element from priority queues? removeMin( )

Heap RemovalBegin downheap

Heap Removal

Heap Removal

Heap RemovalTerminate downheap when reach leaf levelkey parent is greater than key child

The heap sortUsing a heap to sort data involves performing N insertions followed by N delete min operations as described above. Memory usage will depend upon whether the data already exists in memory or whether the data is on disk. Allocating the array to be used to store the heap will be more efficient if N, the number of records, can be known in advance. Dynamic allocation of the array will then be possible, and this is likely to be preferable to preallocating the array.

HeapsA heap is a binary tree T that stores a key-element pairs at its internal nodesIt satisfies two properties:MinHeap: key(parent) key(child)[OR MaxHeap: key(parent) key(child)]all levels are full, except the last one, which is left-filled

What are Heaps Useful for?To implement priority queuesPriority queue = a queue where all elements have a priority associated with themRemove in a priority queue removes the element with the smallest priorityinsertremoveMin

Heap or Not a Heap?

ADT for Min Heapobjects: n > 0 elements organized in a binary tree so that the value in each node is at least as large as those in its childrenmethod: Heap Create(MAX_SIZE)::= create an empty heap that can hold a maximum of max_size elements Boolean HeapFull(heap, n)::= if (n==max_size) return TRUE else return FALSE Heap Insert(heap, item, n)::= if (!HeapFull(heap,n)) insert item into heap and return the resulting heap else return error Boolean HeapEmpty(heap, n)::= if (n>0) return FALSE else return TRUE Element Delete(heap,n)::= if (!HeapEmpty(heap,n)) return one instance of the smallest element in the heap and remove it from the heap else return error

Building a Heapbuild (n + 1)/2 trivial one-element heaps

build three-element heaps on top of them

Building a Heapdownheap to preserve the order property

now form seven-element heaps

Building a Heap

Building a Heap

Data structures and algorithms

Technology