8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
1/346
Coursenotes
A Practical Introduction to
Data Structures and Algorithm Analysis
Second Edition
Clifford A. Shaffer
Department of Computer Science
Virginia Tech
Copyright 2000, 2001
Last Updated: 01/10/2003
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
2/346
The Need for Data Structures
Data structures organize data
more efficient programs.
More powerful computers more complexapplications.
More complex applications demand more
calculations.Complex computing tasks are unlike our
everyday experience.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
3/346
Organizing Data
Any organization for a collection of records
can be searched, processed in any order,or modified.
The choice of data structure and algorithm
can make the difference between aprogram running in a few seconds or many
days.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
4/346
Efficiency
A solution is said to be efficient if it solves
the problem within its resource constraints.
Space
Time
The cost of a solution is the amount ofresources that the solution consumes.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
5/346
Selecting a Data Structure
Select a data structure as follows:
1. Analyze the problem to determine theresource constraints a solution must
meet.2. Determine the basic operations that must
be supported. Quantify the resource
constraints for each operation.3. Select the data structure that best meets
these requirements.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
6/346
Some Questions to Ask
Are all data inserted into the data structure
at the beginning, or are insertionsinterspersed with other operations?
Can data be deleted?
Are all data processed in some well-defined order, or is random accessallowed?
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
7/346
Data Structure Philosophy
Each data structure has costs and benefits.
Rarely is one data structure better thananother in all situations.
A data structure requires:
space for each data item it stores,
time to perform each basic operation,
programming effort.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
8/346
Data Structure Philosophy (cont)
Each problem has constraints on availablespace and time.
Only after a careful analysis of problem
characteristics can we know the best datastructure for the task.
Bank example:
Start account: a few minutes Transactions: a few seconds Close account: overnight
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
9/346
Goals of this Course
1. Reinforce the concept that costs andbenefits exist for every data structure.
2. Learn the commonly used data
structures. These form a programmer's basic data
structure ``toolkit.'
3. Understand how to measure the cost of adata structure or program.
These techniques also allow you to judge themerits of new data structures that you orothers might invent.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
10/346
Abstract Data Types
Abstract Data Type (ADT): a definition for adata type solely in terms of a set of valuesand a set of operations on that data type.
Each ADT operation is defined by its inputsand outputs.
Encapsulation: Hide implementation details.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
11/346
Data Structure
A data structure is the physicalimplementation of an ADT. Each operation associated with the ADT is
implemented by one or more subroutines inthe implementation.
Data structure usually refers to anorganization for data in main memory.
File structure is an organization for data onperipheral storage, such as a disk drive.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
12/346
Metaphors
An ADT manages complexity throughabstraction: metaphor. Hierarchies of labels
Ex: transistors gates CPU.
In a program, implement an ADT, then think
only about the ADT, not itsimplementation.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
13/346
Logical vs. Physical Form
Data items have both a logical and aphysical form.
Logical form: definition of the data itemwithin an ADT. Ex: Integers in mathematical sense: +, -
Physical form: implementation of the dataitem within a data structure. Ex: 16/32 bit integers, overflow.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
14/346
Data Type
ADT:Type
Operations
Data Items:Logical Form
Data Items:Physical Form
Data Structure:
Storage SpaceSubroutines
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
15/346
Problems
Problem: a task to be performed.
Best thought of as inputs and matchingoutputs.
Problem definition should include constraintson the resources that may be consumed byany acceptable solution.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
16/346
Problems (cont)
Problems mathematical functions A function is a matching between inputs (the
domain) and outputs (the range).
An input to a function may be single number,or a collection of information.
The values making up an input are called theparameters of the function.
A particular input must always result in thesame output every time the function iscomputed.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
17/346
Algorithms and ProgramsAlgorithm: a method or a process followed to
solve a problem. A recipe.
An algorithm takes the input to a problem(function) and transforms it to the output. A mapping of input to output.
A problem can have many algorithms.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
18/346
Algorithm Properties
An algorithm possesses the followingproperties: It must be correct. It must be composed of a series of concrete steps.
There can be no ambiguity as to which step will beperformed next.
It must be composed of a finite number of steps. It must terminate.
A computer program is an instance, orconcrete representation, for an algorithmin some programming language.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
19/346
Mathematical Background
Set concepts and notation.
Recursion
Induction Proofs
Logarithms
Summations
Recurrence Relations
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
20/346
Estimation Techniques
Known as back of the envelope orback of the napkin calculation
1. Determine the major parameters that effect the
problem.
2. Derive an equation that relates the parametersto the problem.
3. Select values for the parameters, and applythe equation to yield and estimated solution.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
21/346
Estimation Example
How many library bookcases does ittake to store books totaling onemillion pages?
Estimate: Pages/inch
Feet/shelf Shelves/bookcase
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
22/346
Algorithm Efficiency
There are often many approaches(algorithms) to solve a problem. How dowe choose between them?
At the heart of computer program design aretwo (sometimes conflicting) goals.
1. To design an algorithm that is easy tounderstand, code, debug.
2. To design an algorithm that makes efficientuse of the computers resources.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
23/346
Algorithm Efficiency (cont)
Goal (1) is the concern of SoftwareEngineering.
Goal (2) is the concern of data structuresand algorithm analysis.
When goal (2) is important, how do we
measure an algorithms cost?
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
24/346
How to Measure Efficiency?
1. Empirical comparison (run programs)2. Asymptotic Algorithm Analysis
Critical resources:
Factors affecting running time:
For most algorithms, running time dependson size of the input.
Running time is expressed as T(n) for somefunction Ton input size n.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
25/346
Examples of Growth Rate
Example 1.
// Find largest valueint largest(int array[], int n) {int currlarge = 0; // Largest value seenfor (int i=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
26/346
Examples (cont)
Example 2: Assignment statement.
Example 3:
sum = 0;for (i=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
27/346
Growth Rate Graph
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
28/346
Best, Worst, Average Cases
Not all inputs of a given size take the sametime to run.
Sequential search for Kin an array of n
integers: Begin at first element in array and look ateach element in turn until Kis found
Best case:Worst case:
Average case:
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
29/346
Which Analysis to Use?
While average time appears to be the fairestmeasure, it may be diffiuclt to determine.
When is the worst case time important?
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
30/346
Faster Computer or Algorithm?
What happens when we buy a computer 10times faster?
T
(n
)n n
Changen/n
10n 1,000 10,000 n = 10n 10
20n 500 5,000 n = 10n 10
5nlog n 250 1,842 10 n< n < 10n 7.37
2n2 70 223 n = 10n 3.16
2n 13 16 n = n+ 3 -----
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
31/346
Asymptotic Analysis: Big-oh
Definition: For T(n) a non-negatively valuedfunction, T(n) is in the set O(f(n)) if thereexist two positive constants cand n0
such that T(n) n0.Usage: The algorithm is in O(n2) in [best, average,
worst] case.
Meaning: For all data sets big enough (i.e., n>n0),the algorithm always executes in less thancf(n) steps in [best, average, worst] case.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
32/346
Big-oh Notation (cont)
Big-oh notation indicates an upper bound.
Example: If T(n) = 3n2then T(n) is in O(n2).
Wish tightest upper bound:
While T(n) = 3n2is in O(n3), we prefer O(n2).
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
33/346
Big-Oh Examples
Example 1: Finding valueXin an array(average cost).
T(n) = csn/2.For all values of n> 1, csn/2
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
34/346
Big-Oh Examples
Example 2: T(n) = c1n2+ c2nin average
case.
c1n2+ c2n
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
35/346
A Common Misunderstanding
The best case for my algorithm is n=1because that is the fastest. WRONG!
Big-oh refers to a growth rate as ngrows to.
Best case is defined as which input of size n
is cheapest among all inputs of size n.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
36/346
Big-Omega
Definition: For T(n) a non-negatively valuedfunction, T(n) is in the set (g(n)) if thereexist two positive constants cand n0
such that T(n) >= cg(n) for all n> n0.
Meaning: For all data sets big enough (i.e.,n> n0), the algorithm always executes in
more than cg(n) steps.
Lower bound.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
37/346
Big-Omega Example
T(n) = c1n2+ c2n.
c1n2+ c2n>= c1n
2for all n> 1.
T(n) >= cn2for c= c1and n0= 1.
Therefore, T(n) is in (n2) by the definition.
We want the greatest lower bound.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
38/346
Theta Notation
When big-Oh and meet, we indicate thisby using (big-Theta) notation.
Definition: An algorithm is said to be (h(n))if it is in O(h(n)) and it is in (h(n)).
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
39/346
A Common Misunderstanding
Confusing worst case with upper bound.
Upper bound refers to a growth rate.
Worst case refers to the worst input fromamong the choices for possible inputs of
a given size.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
40/346
Simplifying Rules
1. If f(n) is in O(g(n)) andg(n) is in O(h(n)),then f(n) is in O(h(n)).
2. If f(n) is in O(kg(n)) for any constant k>
0, then f(n) is in O(g(n)).3. If f1(n) is in O(g1(n)) and f2(n) is in
O(g2(n)), then (f1+ f2)(n) is in
O(max(g1(n),g2(n))).4. If f1(n) is in O(g1(n)) and f2(n) is inO(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)).
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
41/346
Running Time Examples (1)
Example 1: a = b;
This assignment takes constant time, so it is
(1).
Example 2:sum = 0;
for (i=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
42/346
Running Time Examples (2)
Example 3:sum = 0;for (i=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
43/346
Running Time Examples (3)
Example 4:sum1 = 0;for (i=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
44/346
Running Time Examples (4)
Example 5:sum1 = 0;for (k=1; k
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
45/346
Binary Search
How many elements are examined in worstcase?
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
46/346
Binary Search
// Return position of element in sorted// array of size n with value K.int binary(int array[], int n, int K) {int l = -1;int r = n; // l, r are beyond array bounds
while (l+1 != r) { // Stop when l, r meetint i = (l+r)/2; // Check middleif (K < array[i]) r = i; // Left halfif (K == array[i]) return i; // Found itif (K > array[i]) l = i; // Right half
}
return n; // Search value not in array}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
47/346
Other Control Statements
whileloop: Analyze like a forloop.
ifstatement: Take greater complexity ofthen/elseclauses.
switchstatement: Take complexity of mostexpensive case.
Subroutine call: Complexity of thesubroutine.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
48/346
Analyzing Problems
Upper bound: Upper bound of best knownalgorithm.
Lower bound: Lower bound for everypossible algorithm.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
49/346
Analyzing Problems: Example
Common misunderstanding: No distinctionbetween upper/lower bound when you knowthe exact running time.
Example of imperfect knowledge: Sorting
1. Cost of I/O: (n).2. Bubble or insertion sort: O(n2).
3. A better sort (Quicksort, Mergesort,Heapsort, etc.): O(nlog n).
4. We prove later that sorting is (nlog n).
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
50/346
Multiple Parameters
Compute the rank ordering for all Cpixelvalues in a picture of Ppixels.
for (i=0; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
51/346
Space Bounds
Space bounds can also be analyzed withasymptotic complexity analysis.
Time: AlgorithmSpace Data Structure
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
52/346
Space/Time Tradeoff Principle
One can often reduce time if one is willing tosacrifice space, or vice versa.
Encoding or packing informationBoolean flags
Table lookupFactorials
Disk-based Space/Time Tradeoff Principle:The smaller you make the disk storagerequirements, the faster your programwill run.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
53/346
Lists
A list is a finite, ordered sequence of dataitems.
Important concept: List elements have aposition.
Notation:
What operations should we implement?
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
54/346
List Implementation Concepts
Our list implementation will support theconcept of a current position.
We will do this by defining the list in terms ofleft and right partitions. Either or both partitions may be empty.
Partitions are separated by the fence.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
55/346
List ADT
template class List {
public:
virtual void clear() = 0;
virtual bool insert(const Elem&) = 0;
virtual bool append(const Elem&) = 0;
virtual bool remove(Elem&) = 0;
virtual void setStart() = 0;
virtual void setEnd() = 0;
virtual void prev() = 0;
virtual void next() = 0;
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
56/346
List ADT (cont)
virtual int leftLength() const = 0;
virtual int rightLength() const = 0;
virtual bool setPos(int pos) = 0;
virtual bool getValue(Elem&) const = 0;
virtual void print() const = 0;
};
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
57/346
List ADT Examples
List:
MyList.insert(99);
Result:
Iterate through the whole list:
for (MyList.setStart(); MyList.getValue(it);MyList.next())
DoSomething(it);
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
58/346
List Find Function
// Return true iff K is in list
bool find(List& L, int K) {
int it;
for (L.setStart(); L.getValue(it);
L.next())if (K == it) return true; // Found it
return false; // Not found
}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
59/346
Array-Based List Insert
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
60/346
Array-Based List Class (1)
template // Array-based listclass AList : public List {private:int maxSize; // Maximum size of listint listSize; // Actual elem count
int fence; // Position of fenceElem* listArray; // Array holding list
public:
AList(int size=DefaultListSize) {
maxSize = size;listSize = fence = 0;
listArray = new Elem[maxSize];
}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
61/346
Array-Based List Class (2)
~AList() { delete [] listArray; }void clear() {delete [] listArray;listSize = fence = 0;listArray = new Elem[maxSize];
}void setStart() { fence = 0; }void setEnd() { fence = listSize; }void prev() { if (fence != 0) fence--; }void next() { if (fence
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
62/346
Array-Based List Class (3)
bool setPos(int pos) {if ((pos >= 0) && (pos = 0) && (pos
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
63/346
Insert
// Insert at front of right partitiontemplate bool AList::insert(const Elem& item) {if (listSize == maxSize) return false;for(int i=listSize; i>fence; i--)
// Shift Elems up to make roomlistArray[i] = listArray[i-1];
listArray[fence] = item;listSize++; // Increment list sizereturn true;
}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
64/346
Append
// Append Elem to end of the listtemplate bool AList::append(const Elem& item) {if (listSize == maxSize) return false;listArray[listSize++] = item;
return true;}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
65/346
Remove
// Remove and return first Elem in right// partitiontemplate boolAList::remove(Elem& it) {if (rightLength() == 0) return false;
it = listArray[fence]; // Copy Elemfor(int i=fence; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
66/346
Link Class
Dynamic allocation of new list elements.
// Singly-linked list nodetemplate class Link {
public:Elem element; // Value for this nodeLink *next; // Pointer to next nodeLink(const Elem& elemval,
Link* nextval =NULL)
{ element = elemval; next = nextval; }Link(Link* nextval =NULL){ next = nextval; }
};
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
67/346
Linked List Position (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
68/346
Linked List Position (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
69/346
Linked List Class (1)
/ Linked list implementationtemplate class LList:
public List {private:Link* head; // Point to list header
Link* tail; // Pointer to last ElemLink* fence;// Last element on leftint leftcnt; // Size of leftint rightcnt; // Size of rightvoid init() { // Intialization routine
fence = tail = head = new Link;leftcnt = rightcnt = 0;
}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
70/346
Linked List Class (2)
void removeall() { // Return link nodes tofree storewhile(head != NULL) {fence = head;head = head->next;
delete fence;}
}public:LList(int size=DefaultListSize)
{ init(); }~LList() { removeall(); } // Destructorvoid clear() { removeall(); init(); }
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
71/346
Linked List Class (3)
void setStart() {fence = head; rightcnt += leftcnt;leftcnt = 0; }
void setEnd() {fence = tail; leftcnt += rightcnt;rightcnt = 0; }
void next() {// Don't move fence if right emptyif (fence != tail) {fence = fence->next; rightcnt--;
leftcnt++; }}int leftLength() const { return leftcnt; }int rightLength() const { return rightcnt; }bool getValue(Elem& it) const {if(rightLength() == 0) return false;it = fence->next->element;return true; }
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
72/346
Insertion
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
73/346
Insert/Append
// Insert at front of right partitiontemplate bool LList::insert(const Elem& item) {fence->next =new Link(item, fence->next);
if (tail == fence) tail = fence->next;rightcnt++;return true;}
// Append Elem to end of the listtemplate
bool LList::append(const Elem& item) {tail = tail->next =new Link(item, NULL);
rightcnt++;return true;}
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
74/346
Removal
R
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
75/346
Remove
// Remove and return first Elem in right// partitiontemplate boolLList::remove(Elem& it) {if (fence->next == NULL) return false;it = fence->next->element; // Remember val
// Remember link nodeLink* ltemp = fence->next;fence->next = ltemp->next; // Removeif (tail == ltemp) // Reset tailtail = fence;
delete ltemp; // Reclaim spacerightcnt--;return true;
}
P
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
76/346
Prev
// Move fence one step left;// no change if left is emptytemplate voidLList::prev() {Link* temp = head;if (fence == head) return; // No prev Elemwhile (temp->next!=fence)temp=temp->next;
fence = temp;leftcnt--;rightcnt++;
}
S
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
77/346
Setpos
// Set the size of left partition to postemplate bool LList::setPos(int pos) {if ((pos < 0) || (pos > rightcnt+leftcnt))return false;
fence = head;for(int i=0; inext;
return true;}
C i f I l i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
78/346
Comparison of Implementations
Array-Based Lists: Insertion and deletion are (n). Prev and direct access are (1). Array must be allocated in advance.
No overhead if all array positions are full.
Linked Lists: Insertion and deletion are (1).
Prev and direct access are (n). Space grows with number of elements. Every element requires overhead.
S C i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
79/346
Space Comparison
Break-even point:
DE= n(P+ E);
n= DEP+ E
E: Space for data value.P: Space for pointer.D: Number of elements in array.
F li t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
80/346
Freelists
System newand deleteare slow.// Singly-linked list node with freelisttemplate class Link {private:static Link* freelist; // Head
public:Elem element; // Value for this nodeLink* next; // Point to next nodeLink(const Elem& elemval,
Link* nextval =NULL)
{ element = elemval; next = nextval; }Link(Link* nextval =NULL) {next=nextval;}void* operator new(size_t); // Overloadvoid operator delete(void*); // Overload
};
F li t (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
81/346
Freelists (2)
template Link* Link::freelist = NULL;
template // Overload for newvoid* Link::operator new(size_t) {if (freelist == NULL) return ::new Link;
Link* temp = freelist; // Reusefreelist = freelist->next;return temp; // Return the link
}
template // Overload deletevoid Link::operator delete(void* ptr){((Link*)ptr)->next = freelist;freelist = (Link*)ptr;
}
D bl Li k d Li t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
82/346
Doubly Linked Lists
Simplify insertion and deletion: Add a prevpointer.
// Doubly-linked list link nodetemplate class Link {
public:Elem element; // Value for this nodeLink *next; // Pointer to next nodeLink *prev; // Pointer to previous nodeLink(const Elem& e, Link* prevp =NULL,
Link* nextp =NULL){ element=e; prev=prevp; next=nextp; }Link(Link* prevp =NULL, Link* nextp =NULL){ prev = prevp; next = nextp; }
};
D bl Li k d Li t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
83/346
Doubly Linked Lists
D bl Li k d I t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
84/346
Doubly Linked Insert
D bl Li k d I t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
85/346
Doubly Linked Insert
// Insert at front of right partitiontemplate bool LList::insert(const Elem& item) {fence->next =new Link(item, fence, fence->next);
if (fence->next->next != NULL)fence->next->next->prev = fence->next;
if (tail == fence) // Appending new Elemtail = fence->next; // so set tail
rightcnt++; // Added to right
return true;}
D bl Li k d R
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
86/346
Doubly Linked Remove
Do bl Linked Remo e
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
87/346
Doubly Linked Remove
// Remove, return first Elem in right parttemplate bool LList::remove(Elem& it) {if (fence->next == NULL) return false;it = fence->next->element;
Link* ltemp = fence->next;if (ltemp->next != NULL)ltemp->next->prev = fence;
else tail = fence; // Reset tailfence->next = ltemp->next; // Remove
delete ltemp; // Reclaim spacerightcnt--; // Removed from rightreturn true;
}
Dictionary
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
88/346
Dictionary
Often want to insert records, delete records,search for records.
Required concepts:
Search key: Describe what we are lookingfor
Key comparison Equality: sequential search
Relative order: sorting Record comparison
Comparator Class
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
89/346
Comparator Class
How do we generalize comparison? Use ==, =: Disastrous Overload ==, =: Disastrous Define a function with a standard name
Implied obligation Breaks down with multiple key fields/indices
for same object Pass in a function
Explicit obligation Function parameter Template parameter
Comparator Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
90/346
Comparator Example
class intintCompare {public:static bool lt(int x, int y){ return x < y; }
static bool eq(int x, int y)
{ return x == y; }static bool gt(int x, int y){ return x > y; }
};
Comparator Example (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
91/346
Comparator Example (2)
class PayRoll {public:int ID;char* name;
};
class IDCompare {public:static bool lt(Payroll& x, Payroll& y){ return x.ID < y.ID; }
};
class NameCompare {public:static bool lt(Payroll& x, Payroll& y){ return strcmp(x.name, y.name) < 0; }
};
Dictionary ADT
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
92/346
Dictionary ADT
// The Dictionary abstract class.template class Dictionary {public:
virtual void clear() = 0;virtual bool insert(const Elem&) = 0;virtual bool remove(const Key&, Elem&) = 0;virtual bool removeAny(Elem&) = 0;virtual bool find(const Key&, Elem&)
const = 0;virtual int size() = 0;};
Unsorted List Dictionary
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
93/346
Unsorted List Dictionary
template class UALdict : public
Dictionary {private: AList* list;public:bool remove(const Key& K, Elem& e) {
for(list->setStart(); list->getValue(e);list->next())
if (KEComp::eq(K, e)) {list->remove(e);
return true;}return false;
}};
Stacks
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
94/346
Stacks
LIFO: Last In, First Out.
Restricted form of list: Insert and removeonly at front of list.
Notation: Insert: PUSH Remove: POP The accessible element is called TOP.
Stack ADT
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
95/346
Stack ADT
// Stack abtract classtemplate class Stack {public:// Reinitialize the stackvirtual void clear() = 0;// Push an element onto the top of the stack.
virtual bool push(const Elem&) = 0;// Remove the element at the top of the stack.virtual bool pop(Elem&) = 0;// Get a copy of the top element in the stackvirtual bool topValue(Elem&) const = 0;// Return the number of elements in the stack.
virtual int length() const = 0;};
Array Based Stack
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
96/346
Array-Based Stack
// Array-based stack implementationprivate:int size; // Maximum size of stackint top; // Index for top elementElem *listArray; // Array holding elements
Issues: Which end is the top? Where does top point to?
What is the cost of the operations?
Linked Stack
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
97/346
Linked Stack
// Linked stack implementationprivate:Link* top; // Pointer to first elemint size; // Count number of elems
What is the cost of the operations?
How do space requirements compare to thearray-based stack implementation?
Queues
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
98/346
Queues
FIFO: First in, First Out
Restricted form of list: Insert at one end,remove from the other.
Notation: Insert: Enqueue Delete: Dequeue
First element: Front Last element: Rear
Queue Implementation (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
99/346
Queue Implementation (1)
Queue Implementation (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
100/346
Queue Implementation (2)
Binary Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
101/346
Binary Trees
A binary tree is made up of a finite set ofnodes that is either empty or consists of anode called the root together with twobinary trees, called the left and rightsubtrees, which are disjoint from eachother and from the root.
Binary Tree Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
102/346
Binary Tree Example
Notation: Node,children, edge,parent, ancestor,descendant, path,depth, height, level,leaf node, internalnode, subtree.
Full and Complete Binary Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
103/346
Full and Complete Binary Trees
Full binary tree: Each node is either a leaf orinternal node with exactly two non-empty children.
Complete binary tree: If the height of the tree is d,
then all leaves except possibly level darecompletely full. The bottom level has all nodes tothe left side.
Full Binary Tree Theorem (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
104/346
Full Binary Tree Theorem (1)
Theorem: The number of leaves in a non-emptyfull binary tree is one more than the number ofinternal nodes.
Proof(by Mathematical Induction):
Base case: A full binary tree with 1 internal node musthave two leaf nodes.
Induction Hypothesis: Assume any full binary tree Tcontaining n-1 internal nodes has nleaves.
Full Binary Tree Theorem (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
105/346
Full Binary Tree Theorem (2)
Induction Step: Given tree Twith n internalnodes, pick internal node Iwith two leaf children.Remove Is children, call resulting tree T.
By induction hypothesis,T
is a full binary tree withn leaves.
Restore Is two children. The number of internalnodes has now gone up by 1 to reach n. The
number of leaves has also gone up by 1.
Full Binary Tree Corollary
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
106/346
Full Binary Tree Corollary
Theorem: The number of null pointers in anon-empty tree is one more than thenumber of nodes in the tree.
Proof: Replace all null pointers with apointer to an empty leaf node. This is afull binary tree.
Binary Tree Node Class (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
107/346
Binary Tree Node Class (1)
// Binary tree node classtemplate class BinNodePtr : public BinNode {private:Elem it; // The node's valueBinNodePtr* lc; // Pointer to left childBinNodePtr* rc; // Pointer to right child
public:BinNodePtr() { lc = rc = NULL; }BinNodePtr(Elem e, BinNodePtr* l =NULL,
BinNodePtr* r =NULL)
{ it = e; lc = l; rc = r; }
Binary Tree Node Class (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
108/346
Binary Tree Node Class (2)
Elem& val() { return it; }void setVal(const Elem& e) { it = e; }inline BinNode* left() const{ return lc; }
void setLeft(BinNode* b){ lc = (BinNodePtr*)b; }
inline BinNode* right() const{ return rc; }
void setRight(BinNode* b){ rc = (BinNodePtr*)b; }
bool isLeaf()
{ return (lc == NULL) && (rc == NULL); }};
Traversals (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
109/346
Traversals (1)
Any process for visiting the nodes insome order is called a traversal.
Any traversal that lists every node inthe tree exactly once is called anenumeration of the trees nodes.
Traversals (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
110/346
Traversals (2)
Preorder traversal: Visit each node beforevisiting its children.
Postorder traversal: Visit each node after
visiting its children. Inorder traversal: Visit the left subtree,
then the node, then the right subtree.
Traversals (3)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
111/346
Traversals (3)
template // Good implementationvoid preorder(BinNode* subroot) {if (subroot == NULL) return; // Emptyvisit(subroot); // Perform some actionpreorder(subroot->left());preorder(subroot->right());
}
template // Bad implementationvoid preorder2(BinNode* subroot) {visit(subroot); // Perform some actionif (subroot->left() != NULL)preorder2(subroot->left());
if (subroot->right() != NULL)preorder2(subroot->right());
}
Traversal Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
112/346
Traversal Example
// Return the number of nodes in the treetemplate int count(BinNode* subroot) {if (subroot == NULL)return 0; // Nothing to count
return 1 + count(subroot->left())
+ count(subroot->right());}
Binary Tree Implementation (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
113/346
Binary Tree Implementation (1)
Binary Tree Implementation (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
114/346
Binary Tree Implementation (2)
Union Implementation (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
115/346
Union Implementation (1)
enum Nodetype {leaf, internal};class VarBinNode { // Generic node classpublic:Nodetype mytype; // Store type for nodeunion {struct { // nternal node
VarBinNode* left; // Left childVarBinNode* right; // Right childOperator opx; // Value
} intl;Operand var; // Leaf: Value only
};
Union Implementation (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
116/346
Union Implementation (2)
// Leaf constructorVarBinNode(const Operand& val){ mytype = leaf; var = val; }
// Internal node constructorVarBinNode(const Operator& op,
VarBinNode* l, VarBinNode* r) {mytype = internal; intl.opx = op;intl.left = l; intl.right = r;
}bool isLeaf() { return mytype == leaf; }VarBinNode* leftchild(){ return intl.left; }
VarBinNode* rightchild()
{ return intl.right; }};
Union Implementation (3)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
117/346
Union Implementation (3)
// Preorder traversalvoid traverse(VarBinNode* subroot) {if (subroot == NULL) return;if (subroot->isLeaf())cout rightchild());
}}
Inheritance (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
118/346
Inheritance (1)
class VarBinNode { // Abstract base classpublic:virtual bool isLeaf() = 0;
};
class LeafNode : public VarBinNode { // Leafprivate:Operand var; // Operand value
public:LeafNode(const Operand& val){ var = val; } // Constructor
bool isLeaf() { return true; }Operand value() { return var; }
};
Inheritance (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
119/346
Inheritance (2)
// Internal nodeclass IntlNode : public VarBinNode {private:VarBinNode* left; // Left childVarBinNode* right; // Right childOperator opx; // Operator value
public:IntlNode(const Operator& op,
VarBinNode* l, VarBinNode* r){ opx = op; left = l; right = r; }
bool isLeaf() { return false; }VarBinNode* leftchild() { return left; }
VarBinNode* rightchild() { return right; }Operator value() { return opx; }};
Inheritance (3)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
120/346
Inheritance (3)
// Preorder traversalvoid traverse(VarBinNode *subroot) {if (subroot == NULL) return; // Emptyif (subroot->isLeaf()) // Do leaf nodecout
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
121/346
Composite (1)
class VarBinNode { // Abstract base classpublic:virtual bool isLeaf() = 0;virtual void trav() = 0;
};
class LeafNode : public VarBinNode { // Leafprivate:Operand var; // Operand value
public:LeafNode(const Operand& val){ var = val; } // Constructor
bool isLeaf() { return true; }
Operand value() { return var; }void trav() { cout
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
122/346
Co pos te ( )
class IntlNode : public VarBinNode {private:VarBinNode* lc; // Left childVarBinNode* rc; // Right childOperator opx; // Operator value
public:IntlNode(const Operator& op,
VarBinNode* l, VarBinNode* r){ opx = op; lc = l; rc = r; }
bool isLeaf() { return false; }VarBinNode* left() { return lc; }VarBinNode* right() { return rc; }Operator value() { return opx; }
void trav() {cout
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
123/346
Co pos te (3)
// Preorder traversalvoid traverse(VarBinNode *root) {if (root != NULL)root->trav();
}
Space Overhead (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
124/346
p ( )
From the Full Binary Tree Theorem: Half of the pointers are null.
If leaves store only data, then overhead
depends on whether the tree is full.
Ex: All nodes the same, with two pointers tochildren:
Total space required is (2p+ d)n Overhead: 2pn Ifp= d, this means 2p/(2p+ d) = 2/3 overhead.
Space Overhead (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
125/346
p ( )
Eliminate pointers from the leaf nodes:n/2(2p) pn/2(2p) + dn p+ d
This is 1/2 ifp= d.
2p/(2p+ d) if data only at leaves 2/3overhead.
Note that some method is needed todistinguish leaves from internal nodes.
=
Array Implementation (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
126/346
y p ( )
Position 0 1 2 3 4 5 6 7 8 9 10 11
Parent -- 0 0 1 1 2 2 3 3 4 4 5
Left Child 1 3 5 7 9 11 -- -- -- -- -- --
Right Child 2 4 6 8 10 -- -- -- -- -- -- --
Left Sibling -- -- 1 -- 3 -- 5 -- 7 -- 9 --
Right Sibling -- 2 -- 4 -- 6 -- 8 -- 10 -- --
Array Implementation (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
127/346
y p ( )
Parent (r) =
Leftchild(r) =
Rightchild(r) =Leftsibling(r) =
Rightsibling(r) =
Binary Search Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
128/346
y
BST Property: All elements stored in the leftsubtree of a node with value Khave values < K.All elements stored in the right subtree of a nodewith value Khave values >= K.
BST ADT(1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
129/346
( )
// BST implementation for the Dictionary ADTtemplate class BST : public Dictionary {private:BinNode* root; // Root of the BSTint nodecount; // Number of nodesvoid clearhelp(BinNode*);BinNode*inserthelp(BinNode*, const Elem&);
BinNode*deletemin(BinNode*,BinNode*&);
BinNode* removehelp(BinNode*,const Key&, BinNode*&);
bool findhelp(BinNode*, const Key&,Elem&) const;
void printhelp(BinNode*, int) const;
BST ADT(2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
130/346
( )
public:BST() { root = NULL; nodecount = 0; }~BST() { clearhelp(root); }void clear() { clearhelp(root); root = NULL;
nodecount = 0; }bool insert(const Elem& e) {root = inserthelp(root, e);nodecount++;return true; }
bool remove(const Key& K, Elem& e) {BinNode* t = NULL;root = removehelp(root, K, t);if (t == NULL) return false;
e = t->val();nodecount--;delete t;return true; }
BST ADT(3)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
131/346
( )
bool removeAny(Elem& e) { // Delete min valueif (root == NULL) return false; // EmptyBinNode* t;root = deletemin(root, t);e = t->val();delete t;nodecount--;
return true;}bool find(const Key& K, Elem& e) const{ return findhelp(root, K, e); }
int size() { return nodecount; }void print() const {
if (root == NULL)cout
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
132/346
template bool BST::findhelp(BinNode* subroot,
const Key& K, Elem& e) const {if (subroot == NULL) return false;
else if (KEComp::lt(K, subroot->val()))return findhelp(subroot->left(), K, e);
else if (KEComp::gt(K, subroot->val()))return findhelp(subroot->right(), K, e);
else { e = subroot->val(); return true; }
}
BST Insert (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
133/346
( )
BST Insert (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
134/346
( )
template BinNode* BST::inserthelp(BinNode* subroot,
const Elem& val) {if (subroot == NULL) // Empty: create node
return new BinNodePtr(val,NULL,NULL);if (EEComp::lt(val, subroot->val()))subroot->setLeft(inserthelp(subroot->left(),
val));else subroot->setRight(
inserthelp(subroot->right(), val));// Return subtree with node insertedreturn subroot;}
Remove Minimum Value
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
135/346
template BinNode* BST::deletemin(BinNode* subroot,
BinNode*& min) {
if (subroot->left() == NULL) {min = subroot;return subroot->right();
}else { // Continue left
subroot->setLeft(deletemin(subroot->left(), min));return subroot;
}}
BST Remove (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
136/346
BST Remove (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
137/346
template BinNode* BST::removehelp(BinNode* subroot,
const Key& K, BinNode*& t) {if (subroot == NULL) return NULL;
else if (KEComp::lt(K, subroot->val()))subroot->setLeft(
removehelp(subroot->left(), K, t));else if (KEComp::gt(K, subroot->val()))subroot->setRight(
removehelp(subroot->right(), K, t));
BST Remove (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
138/346
else { // Found it: remove itBinNode* temp;t = subroot;if (subroot->left() == NULL)subroot = subroot->right();
else if (subroot->right() == NULL)
subroot = subroot->left();else { // Both children are non-emptysubroot->setRight(
deletemin(subroot->right(), temp));Elem te = subroot->val();
subroot->setVal(temp->val());temp->setVal(te);t = temp;
} }return subroot;
}
Cost of BST Operations
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
139/346
Find:
Insert:
Delete:
Heaps
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
140/346
Heap: Complete binary tree with the heapproperty: Min-heap: All values less than child values. Max-heap: All values greater than child values.
The values are partially ordered.
Heap representation: Normally the array-
based complete binary treerepresentation.
Heap ADT
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
141/346
template class maxheap{private:Elem* Heap; // Pointer to the heap arrayint size; // Maximum size of the heapint n; // Number of elems now in heapvoid siftdown(int); // Put element in place
public:
maxheap(Elem* h, int num, int max);int heapsize() const;bool isLeaf(int pos) const;int leftchild(int pos) const;int rightchild(int pos) const;int parent(int pos) const;
bool insert(const Elem&);bool removemax(Elem&);bool remove(int, Elem&);void buildHeap();
};
Building the Heap
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
142/346
(a) (4-2) (4-1) (2-1) (5-2) (5-4) (6-3) (6-5) (7-5) (7-6)(b) (5-2), (7-3), (7-1), (6-1)
Siftdown (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
143/346
For fast heap construction: Work from high end of array to low end. Call siftdownfor each item. Dont need to call siftdownon leaf nodes.
template void maxheap::siftdown(int pos) {while (!isLeaf(pos)) {int j = leftchild(pos);int rc = rightchild(pos);if ((rc
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
144/346
Buildheap Cost
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
145/346
Cost for heap construction:
log n
(i- 1) n/2i
n.i=1
Remove Max Value
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
146/346
template bool maxheap::removemax(Elem& it) {if (n == 0) return false; // Heap is emptyswap(Heap, 0, --n); // Swap max with endif (n != 0) siftdown(0);
it = Heap[n]; // Return max valuereturn true;}
Priority Queues (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
147/346
A priority queue stores objects, and on requestreleases the object with greatest value.
Example: Scheduling jobs in a multi-taskingoperating system.
The priority of a job may change, requiring somereordering of the jobs.
Implementation: Use a heap to store the priority
queue.
Priority Queues (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
148/346
To support priority reordering, delete and re-insert.Need to know index for the object in question.
template bool maxheap::remove(int pos,
Elem& it) {if ((pos < 0) || (pos >= n)) return false;swap(Heap, pos, --n);while ((pos != 0) && (Comp::gt(Heap[pos],
Heap[parent(pos)])))swap(Heap, pos, parent(pos));
siftdown(pos);it = Heap[n];return true;
}
Huffman Coding Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
149/346
ASCII codes: 8 bits per character. Fixed-length coding.
Can take advantage of relative frequency of letters
to save space. Variable-length coding
Build the tree with minimum external path weight.
Z K F C U D L E
2 7 24 32 37 42 42 120
Huffman Tree Construction (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
150/346
Huffman Tree Construction (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
151/346
Assigning Codes
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
152/346
Letter Freq Code BitsC 32
D 42
E 120
F 24
K 7
L 42
U 37Z 2
Coding and Decoding
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
153/346
A set of codes is said to meet the prefixproperty if no code in the set is the prefixof another.
Code for DEED:
Decode 1011001110111101:
Expected cost per letter:
General Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
154/346
General Tree Node
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
155/346
// General tree node ADTtemplate class GTNode {public:GTNode(const Elem&); // Constructor~GTNode(); // DestructorElem value(); // Return valuebool isLeaf(); // TRUE if is a leaf
GTNode* parent(); // Return parentGTNode* leftmost_child(); // First childGTNode* right_sibling(); // Right siblingvoid setValue(Elem&); // Set valuevoid insert_first(GTNode* n);void insert_next(GTNode* n);
void remove_first(); // Remove first childvoid remove_next(); // Remove sibling};
General Tree Traversal
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
156/346
template void GenTree::printhelp(GTNode* subroot) {if (subroot->isLeaf()) cout
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
157/346
Equivalence Class Problem
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
158/346
The parent pointer representation is good foranswering: Are two elements in the same tree?
// Return TRUE if nodes in different treesbool Gentree::differ(int a, int b) {int root1 = FIND(a); // Find root for aint root2 = FIND(b); // Find root for breturn root1 != root2; // Compare roots
}
Union/Find
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
159/346
void Gentree::UNION(int a, int b) {int root1 = FIND(a); // Find root for aint root2 = FIND(b); // Find root for bif (root1 != root2) array[root2] = root1;
}
int Gentree::FIND(int curr) const {
while (array[curr]!=ROOT) curr = array[curr];return curr; // At root}
Want to keep the depth small.
Weighted union rule: Join the tree with fewernodes to the tree with more nodes.
Equiv Class Processing (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
160/346
Equiv Class Processing (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
161/346
Path Compression
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
162/346
int Gentree::FIND(int curr) const {if (array[curr] == ROOT) return curr;return array[curr] = FIND(array[curr]);
}
Lists of Children
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
163/346
Leftmost Child/Right Sibling (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
164/346
Leftmost Child/Right Sibling (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
165/346
Linked Implementations (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
166/346
Linked Implementations (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
167/346
Converting to a Binary Tree
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
168/346
Left child/right sibling representationessentially stores a binary tree.
Use this process to convert any general treeto a binary tree.
A forest is a collection of one or moregeneral trees.
Sequential Implementations (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
169/346
List node values in the order they would bevisited by a preorder traversal.
Saves space, but allows only sequentialaccess.
Need to retain tree structure forreconstruction.
Example: For binary trees, us a symbol tomark nulllinks.AB/D//CEG///FH//I//
Sequential Implementations (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
170/346
Example: For full binary trees, mark nodesas leaf or internal.AB/DCEG/FHI
Example: For general trees, mark the end ofeach subtree.
RAC)D)E))BF)))
Sorting
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
171/346
Each record contains a field called the key. Linear order: comparison.
Measures of cost:
Comparisons Swaps
Insertion Sort (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
172/346
Insertion Sort (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
173/346
template
void inssort(Elem A[], int n) {for (int i=1; i0) &&
(Comp::lt(A[j], A[j-1])); j--)swap(A, j, j-1);
}
Best Case:Worst Case:
Average Case:
Bubble Sort (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
174/346
Bubble Sort (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
175/346
template
void bubsort(Elem A[], int n) {for (int i=0; ii; j--)if (Comp::lt(A[j], A[j-1]))swap(A, j, j-1);
}
Best Case:Worst Case:
Average Case:
Selection Sort (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
176/346
Selection Sort (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
177/346
template
void selsort(Elem A[], int n) {for (int i=0; ii; j--) // Find leastif (Comp::lt(A[j], A[lowindex]))lowindex = j; // Put it in place
swap(A, i, lowindex);}}
Best Case:
Worst Case:Average Case:
Pointer Swapping
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
178/346
Summary
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
179/346
Insertion Bubble SelectionComparisons:
Best Case (n) (n2) (n2)Average Case (n2) (n2) (n2)
Worst Case (n2) (n2) (n2)
Swaps
Best Case 0 0 (n)
Average Case (n2) (n2) (n)Worst Case (n2) (n2) (n)
Exchange Sorting
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
180/346
All of the sorts so far rely on exchanges ofadjacent records.
What is the average number of exchanges
required? There are n! permutations Consider permuationXand its reverse,X Together, every pair requires n(n-1)/2
exchanges.
Shellsort
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
181/346
Shellsort
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
182/346
// Modified version of Insertion Sort
template void inssort2(Elem A[], int n, int incr) {for (int i=incr; i=incr) &&(Comp::lt(A[j], A[j-incr])); j-=incr)
swap(A, j, j-incr);}
template void shellsort(Elem A[], int n) { // Shellsortfor (int i=n/2; i>2; i/=2) // For each incr
for (int j=0; j
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
183/346
template
void qsort(Elem A[], int i, int j) {if (j
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
184/346
template
int partition(Elem A[], int l, int r,Elem& pivot) {do { // Move the bounds in until they meetwhile (Comp::lt(A[++l], pivot));while ((r != 0) && Comp::gt(A[--r],
pivot));
swap(A, l, r); // Swap out-of-place values} while (l < r); // Stop when they crossswap(A, l, r); // Reverse last swapreturn l; // Return first pos on right
}
The cost for partition is (n).
Partition Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
185/346
Quicksort Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
186/346
Cost of Quicksort
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
187/346
Best case: Always partition in half.Worst case: Bad partition.Average case:
T(n) = n+ 1 + 1/(n-1) (T(k) + T(n-k))Optimizations for Quicksort:
Better Pivot
Better algorithm for small sublists Eliminate recursion
k=1
n-1
Mergesort
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
188/346
List mergesort(List inlist) {
if (inlist.length()
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
189/346
template
void mergesort(Elem A[], Elem temp[],int left, int right) {int mid = (left+right)/2;if (left == right) return;mergesort(A, temp, left, mid);mergesort(A, temp, mid+1, right);
for (int i=left; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
190/346
template
void mergesort(Elem A[], Elem temp[],int left, int right) {if ((right-left) =left; i--) temp[i] = A[i];for (j=1; j
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
191/346
Mergesort cost:
Mergsort is also good for sorting linked lists.
Mergesort requires twice the space.
Heapsort
t l t < l El l C >
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
192/346
template
void heapsort(Elem A[], int n) { // HeapsortElem mval;maxheap H(A, n, n);for (int i=0; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
193/346
Heapsort Example (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
194/346
Binsort (1)
A i l ffi i t t
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
195/346
A simple, efficient sort:
for (i=0; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
196/346
template
void binsort(Elem A[], int n) {List B[MaxKeyValue];Elem item;for (i=0; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
197/346
Radix Sort (2)
template
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
198/346
template
void radix(Elem A[], Elem B[],int n, int k, int r, int cnt[]) {
// cnt[i] stores # of records in bin[i]int j;for (int i=0, rtok=1; i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
199/346
Radix Sort Cost
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
200/346
Cost: (nk+ rk)
How do n, k, and rrelate?
If key range is small, then this can be (n).
If there are ndistinct keys, then the length of
a key must be at least log n. Thus, Radix Sort is (nlog n) in general case
Empirical Comparison (1)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
201/346
Empirical Comparison (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
202/346
Sorting Lower Bound
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
203/346
We would like to know a lower bound for allpossible sorting algorithms.
Sorting is O(nlog n) (average, worst cases)because we know of algorithms with thisupper bound.
Sorting I/O takes (n) time.
We will now prove (nlog n) lower boundfor sorting.
Decision Trees
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
204/346
Lower Bound Proof
There are n! permutations
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
205/346
There are n! permutations. A sorting algorithm can be viewed as
determining which permutation has been input. Each leaf node of the decision tree corresponds
to one permutation.
A tree with n nodes has (log n) levels, so thetree with n! leaves has (log n!) = (nlog n)levels.
Which node in the decision tree correspondsto the worst case?
Primary vs. Secondary Storage
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
206/346
Primary storage: Main memory (RAM)
Secondary Storage: Peripheral devices Disk drives
Tape drives
Comparisons
Medium Early 1996 Mid 1997 Early 2000
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
207/346
RAM is usually volatile.
RAM is about 1/4 million times faster thandisk.
y y
RAM $45.00 7.00 1.50
Disk 0.25 0.10 0.01
Floppy 0.50 0.36 0.25
Tape 0.03 0.01 0.001
Golden Rule of File Processing
Minimize the number of disk accesses!
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
208/346
Minimize the number of disk accesses!
1. Arrange information so that you get what you wantwith few disk accesses.
2. Arrange information to minimize future disk accesses.
An organization for data on disk is often called afile structure.
Disk-based space/time tradeoff: Compressinformation to save processing time by
reducing disk accesses.
Disk Drives
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
209/346
Sectors
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
210/346
A sector is the basic unit of I/O.
Interleaving factor: Physical distancebetween logically adjacent sectors on atrack.
Terms
When record is read
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
211/346
Locality of Reference:When record is read
from disk, next request is likely to come fromnear the same place in the file.
Cluster: Smallest unit of file allocation, usually
several sectors.Extent: A group of physically contiguous clusters.
Internal fragmentation: Wasted space withinsector if record size does not match sectorsize; wasted space within cluster if file size isnot a multiple of cluster size.
Seek Time
Seek time: Time for I/O head to reach
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
212/346
Seek time: Time for I/O head to reachdesired track. Largely determined bydistance between I/O head and desiredtrack.
Track-to-track time: Minimum time to movefrom one track to an adjacent track.
Average Seek time: Average time to reach atrack for random access.
Other Factors
Rotational Delay or Latency: Time for data
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
213/346
Rotational Delay or Latency: Time for datato rotate under I/O head.
One half of a rotation on average. At 7200 rpm, this is 8.3/2 = 4.2ms.
Transfer time: Time for data to move underthe I/O head.
At 7200 rpm: Number of sectors
read/Number of sectors per track * 8.3ms.
Disk Spec Example
16 8 GB disk on 10 platters = 1 68GB/platter
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
214/346
16.8 GB disk on 10 platters = 1.68GB/platter13,085 tracks/platter256 sectors/track512 bytes/sector
Track-to-track seek time: 2.2 msAverage seek time: 9.5ms4KB clusters, 32 clusters/track.
Interleaving factor of 3.5400RPM
Disk Access Cost Example (1)
Read a 1MB file divided into 2048 records of
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
215/346
Read a 1MB file divided into 2048 records of512 bytes (1 sector) each.
Assume all records are on 8 contiguoustracks.
First track: 9.5 + 11.1/2 + 3 x 11.1 = 48.4 ms
Remaining 7 tracks: 2.2 + 11.1/2 + 3 x 11.1
= 41.1 ms.Total: 48.4 + 7 * 41.1 = 335.7ms
Disk Access Cost Example (2)
Read a 1MB file divided into 2048 records of
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
216/346
Read a 1MB file divided into 2048 records of512 bytes (1 sector) each.
Assume all file clusters are randomly spreadacross the disk.
256 clusters. Cluster read time is(3 x 8)/256 of a rotation for about 1 ms.
256(9.5 + 11.1/2 + (3 x 8)/256) is about3877 ms. or nearly 4 seconds.
How Much to Read?
Read time for one track:
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
217/346
Read time for one track:9.5 + 11.1/2 + 3 x 11.1 = 48.4ms.
Read time for one sector:9.5 + 11.1/2 + (1/256)11.1 = 15.1ms.
Read time for one byte:9.5 + 11.1/2 = 15.05 ms.
Nearly all disk drives read/write one sector
at every I/O access. Also referred to as a page.
Buffers
The information in a sector is stored in a
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
218/346
The information in a sector is stored in abuffer or cache.
If the next I/O access is to the same buffer,
then no need to go to disk.
There are usually one or more input buffersand one or more output buffers.
Buffer Pools
A series of buffers used by an application to
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
219/346
A series of buffers used by an application tocache disk data is called a buffer pool.
Virtual memory uses a buffer pool to imitate
greater RAM memory by actually storinginformation on disk and swappingbetween disk and RAM.
Buffer Pools
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
220/346
Organizing Buffer Pools
Which buffer should be replaced when new
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
221/346
Which buffer should be replaced when newdata must be read?
First-in, First-out: Use the first one on thequeue.
Least Frequently Used (LFU): Count bufferaccesses, reuse the least used.
Least Recently used (LRU): Keep buffers ona linked list. When buffer is accessed,bring it to front. Reuse the one at end.
Bufferpool ADT (1)
class BufferPool { // (1) Message Passing
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
222/346
public:virtual void insert(void* space,int sz, int pos) = 0;
virtual void getbytes(void* space,int sz, int pos) = 0;
};
class BufferPool { // (2) Buffer Passing
public:
virtual void* getblock(int block) = 0;
virtual void dirtyblock(int block) = 0;
virtual int blocksize() = 0;};
Design Issues
Disadvantage of message passing:
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
223/346
g g p g Messages are copied and passed back and forth.
Disadvantages of buffer passing: The user is given access to system memory (the
buffer itself)
The user must explicitly tell the buffer pool whenbuffer contents have been modified, so that modifieddata can be rewritten to disk when the buffer isflushed.
The pointer might become stale when the bufferpoolreplaces the contents of a buffer.
Programmers View of Files
Logical view of files:
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
224/346
g An a array of bytes. A file pointer marks the current position.
Three fundamental operations: Read bytes from current position (move filepointer)
Write bytes to current position (move filepointer)
Set file pointer to specified byte position.
C++ File Functions
#include
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
225/346
void fstream::open(char* name, openmode mode); Example:ios::in | ios::binary
void fstream::close();
fstream::read(char* ptr, int numbytes);
fstream::write(char* ptr, int numbtyes);
fstream::seekg(int pos);fstream::seekg(int pos, ios::curr);
fstream::seekp(int pos);fstream::seekp(int pos, ios::end);
External Sorting
Problem: Sorting data sets too large to fit
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
226/346
Problem: Sorting data sets too large to fitinto main memory.
Assume data are stored on disk drive.
To sort, portions of the data must be broughtinto main memory, processed, andreturned to disk.
An external sort should minimize diskaccesses.
Model of External Computation
Secondary memory is divided into equal-sized
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
227/346
y y qblocks (512, 1024, etc)
A basic I/O operation transfers the contents of onedisk block to/from main memory.
Under certain circumstances, reading blocks of afile in sequential order is more efficient.(When?)
Primary goal is to minimize I/O operations.
Assume only one disk drive is available.
Key Sorting
Often, records are large, keys are small.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
228/346
, g , y Ex: Payroll entries keyed on ID number
Approach 1: Read in entire records, sortthem, then write them out again.
Approach 2: Read only the key values, storewith each key the location on disk of itsassociated record.
After keys are sorted the records can beread and rewritten in sorted order.
Simple External Mergesort (1)
Quicksort requires random access to the
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
229/346
Q qentire set of records.
Better: Modified Mergesort algorithm.
Process nelements in (log n) passes.
A group of sorted records is called a run.
Simple External Mergesort (2)
Split the file into two files.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
230/346
p Read in a block from each file. Take first record from each block, output them in
sorted order. Take next record from each block, output them
to a second file in sorted order. Repeat until finished, alternating between output
files. Read new input blocks as needed. Repeat steps 2-5, except this time input files
have runs of two sorted records that are mergedtogether.
Each pass through the files provides larger runs.
Simple External Mergesort (3)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
231/346
Problems with Simple Mergesort
Is each pass through input and output files
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
232/346
sequential?
What happens if all work is done on a single diskdrive?
How can we reduce the number of Mergesortpasses?
In general, external sorting consists of two phases: Break the files into initial runs
Merge the runs together into a single run.
Breaking a File into Runs
General approach:
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
233/346
pp Read as much of the file into memory as
possible. Perform an in-memory sort. Output this group of records as a single run.
Replacement Selection (1)
Break available memory into an array for
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
234/346
y ythe heap, an input buffer, and an outputbuffer.
Fill the array from disk.
Make a min-heap. Send the smallest value (root) to the
output buffer.
Replacement Selection (2)
If the next key in the file is greater than
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
235/346
y gthe last value output, then
Replace the root with this keyelse
Replace the root with the last key in thearrayAdd the next record in the file to a new heap
(actually, stick it at the end of the array).
RS Example
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
236/346
Snowplow Analogy (1)
Imagine a snowplow moving around a circular
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
237/346
track on which snow falls at a steady rate.
At any instant, there is a certain amount ofsnow Son the track. Some falling snow
comes in front of the plow, some behind.During the next revolution of the plow, all of
this is removed, plus 1/2 of what falls
during that revolution.Thus, the plow removes 2Samount of snow.
Snowplow Analogy (2)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
238/346
Problems with Simple Merge
Simple mergesort: Place runs into two files.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
239/346
Merge the first two runs to output file, thennext two runs, etc.
Repeat process until only one run remains.
How many passes for r initial runs?
Is there benefit from sequential reading?Is working memory well used?
Need a way to reduce the number ofpasses.
Multiway Merge (1)
With replacement selection, each initial run
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
240/346
is several blocks long.
Assume each run is placed in separate file.
Read the first block from each file intomemory and perform an r-way merge.
When a buffer becomes empty, read a blockfrom the appropriate run file.
Each record is read only once from diskduring the merge process.
Multiway Merge (2)
In practice, use only one file and seek to
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
241/346
appropriate block.
Limits to Multiway Merge (1)
Assume working memory is bblocks in size.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
242/346
How many runs can be processed at onetime?
The runs are 2bblocks long (on average).
How big a file can be merged in one pass?
Limits to Multiway Merge (2)
Larger files will need more passes -- but the
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
243/346
run size grows quickly!
This approach trades (log b) (possibly)
sequential passes for a single or veryfew random (block) access passes.
General Principles
A good external sorting algorithm will seek to do
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
244/346
the following: Make the initial runs as long as possible. At all stages, overlap input, processing and
output as much as possible.
Use as much working memory as possible.Applying more memory usually speedsprocessing.
If possible, use additional disk drives for
more overlapping of processing with I/O,and allow for more sequential fileprocessing.
Search
Given: Distinct keys k , k , , k and
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
245/346
1 2 ncollection Tof nrecords of the form(k1, I1), (k2, I2), , (kn, In)
where Ijis the information associated with
key kjfor 1
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
246/346
A successful search is one in which a recordwith key kj= Kis found.
An unsuccessful search is one in which norecord with kj= Kis found (andpresumably no such record exists).
Approaches to Search
1 S i l d li h d (li bl
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
247/346
1. Sequential and list methods (lists, tables,arrays).
2. Direct access by key value (hashing)
3. Tree indexing methods.
Searching Ordered Arrays
S i l S h
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
248/346
Sequential Search
Binary Search
Dictionary Search
Lists Ordered by Frequency
Order lists by (expected) frequency of
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
249/346
occurrence.
Perform sequential search
Cost to access first record: 1Cost to access second record: 2
Expected search cost:
....21 21 nn npppC
Examples(1)
(1) All d h l f
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
250/346
(1) All records have equal frequency.
2/)1(/
1
nniCn
i
n
Examples(2)
(2) E ti l F
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
251/346
(2) Exponential Frequency
ni
nip
n
i
i
if2/1
11if2/1
1
{
n
i
i
n iC1
.2)2/(
Zipf Distributions
Applications:Di t ib ti f f f d i
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
252/346
Distribution for frequency of word usage innatural languages.
Distribution for populations of cities, etc.
80/20 rule: 80% of accesses are to 20% of the records.
For distributions following 80/20 rule,
n
i
ennn nnniiC1
.log/H//
.1.0 nCn
Self-Organizing Lists
Self-organizing lists modify the order ofd ithi th li t b d th t l
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
253/346
records within the list based on the actualpattern of record accesses.
Self-organizing lists use a heuristic fordeciding how to reorder the list. Theseheuristics are similar to the rules formanaging buffer pools.
Heuristics
1. Order by actual historical frequency of(Si il t LFU b ff l
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
254/346
access. (Similar to LFU buffer poolreplacement strategy.)
2. Move-to-Front: When a record is found,move it to the front of the list.
3. Transpose: When a record is found,swap it with the record ahead of it.
Text Compression Example
Application: Text Compression.
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
255/346
Keep a table of words already seen,organized via Move-to-Front heuristic.
If a word not yet seen, send the word.
Otherwise, send (current) index in the table.
The car on the left hit the car I left.The car on 3 left hit 3 5 I 5.
This is similar in spirit to Ziv-Lempel coding.
Searching in Sets
For dense sets (small range, hight f l t i t)
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
256/346
percentage of elements in set).
Can use logical bit operators.
Example: To find all primes that are oddnumbers, compute:0011010100010100 & 0101010101010101
Hashing (1)
Hashing: The process of mapping a keyal e to a position in a table
8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis
257/346
value to a position in a table.
A hash function maps key values topositions. It is denoted by h.
A hash table is an array that holds therecords. It is denoted by HT.
HThas Mslots, indexed form 0 to M-1.
Hashing (2)
For any value Kin the key range and some hashfunction h h (K) i