Top Banner
DATA STRUCTURES + SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park’ slides CS 97SI - Top-coder tutorials. - Programming Challenges book.
93

DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Apr 12, 2019

Download

Documents

vocong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

DATA STRUCTURES + SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following resources.

- Professor Jaehyun Park’ slides CS 97SI - Top-coder tutorials. - Programming Challenges book.

Page 2: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Comments • Urgent => https://open.kattis.com/ • Contest Saturday VS ICPC. • No details => Go over the topics on weekend. • No overstress – No overconfident.

Page 3: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Data Structure • A way to store and organize data in order to support

efficient insertions, queries, searches, updates, and deletions.

Page 4: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Data Structure • Basic data structures (built-in libraries).

•  Linear DS. •  Non-Linear DS.

• Data structures (Own libraries). •  Graphs. •  Union-Find Structures. •  Segment Tree.

Page 5: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Data Structures • Basic data structures (built-in libraries).

•  Linear DS (ordering the elements sequentially). •  Static Array (Array in C/C++ and in Java). •  Resizeable array (C++ STL<vector> and Java ArrayList). •  Linked List: (C++ STL<list> and Java LinkedList). •  Stack (C++ STL<stack> and Java Stack). •  Queue (C++ STL <queue> and Java Queue).

Page 6: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Data Structures • Basic data structures (built-in libraries).

•  Non-Linear DS. •  Balanced Binary Search Tree (C++ STL <map>/<set> and in Java

TreeMap/TreeSet). •  AVL and Red-Black Trees = Balanced BST •  <map> stores (key -> data) VS <set> only stores the key

•  Heap(C++ STL<queue>:priority_queue and Java PriorityQueue). •  BST complete. •  Heap property VS BST protperty.

•  Hash Table (Java HashMap/HashSet/HashTable). •  Non synchronized vs synchronized. •  Null vs non-nulls •  Predictable iteration vs non predictable.

Page 7: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you. • Basic data structures (built-in libraries).

•  Non Linear DS (non-sequential ordering). •  Balanced Binary Search Tree (C++ STL <map>/<set> and Java

TreeMap/TreeSet) •  AVL Tree and Red-Black = Balanced BST. •  <map> stores (key -> data) VS <set> only stores the key

•  Heap (C++ STL <queue> and Java PriorityQueue) •  Heap property VS BST property. •  Complete BST.

•  Hash table

Page 8: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you. • Basic data structures (built-in libraries).

•  Non Linear DS (non-sequential ordering). •  Balanced Binary Search Tree (C++ STL <map>/<set> and Java

TreeMap/TreeSet) •  AVL Tree and Red-Black = Balanced BST. •  <map> stores (key -> data) VS <set> only stores the key

•  Heap (C++ STL <queue> and Java PriorityQueue) •  Heap property VS BST property. •  Complete BST.

•  Hash table

BST HEAP

Page 9: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Deciding the Order of the Tasks • Returns the newest task (stack) • Returns the oldest task (queue) • Returns the most urgent task (priority queue) • Returns the easiest task (priority queue)

Page 10: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

STACK •  Last in, first out (Last In First Out) • Stacks model piles of objects (such as dinner plates) • Supports three constant-time operations

•  Push(x): inserts x into the stack •  Pop(): removes the newest item •  Top(): returns the newest item

• Very easy to implement using an array

Page 11: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

STACK • Have a large enough array s[] and a counter k, which

starts at zero •  Push(x) : set s[k] = x and increment k by 1 •  Pop() : decrement k by 1 •  Top() : returns s[k - 1] (error if k is zero)

• C++ and Java have implementations of stack •  stack (C++), Stack (Java)

Page 12: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

STACK • Useful for:

•  Processing nested formulas •  Depth-first graph traversal •  Data storage in recursive algorithms

Page 13: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

QUEUE •  First in, first out (FIFO) • Supports three constant-time operations

•  Enqueue(x) : inserts x into the queue •  Dequeue() : removes the oldest item •  Front() : returns the oldest item

•  Implementation is similar to that of stack

Page 14: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

QUEUE • Assume that you know the total number of elements that

enter the queue •  ... which allows you to use an array for implementation •  … If not, you can use linked lists or double linked lists

• Maintain two indices head and tail •  Dequeue() increments head •  Enqueue() increments tail •  Use the value of tail - head to check emptiness

• You can use queue (C++) and Queue (Java)

Page 15: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

QUEUE • Useful for

•  implementing buffers •  simulating waiting lists •  shuffling cards

Page 16: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

PRIORITY QUEUE • Each element in a PQ has a priority value •  Three operations:

•  Insert(x, p) : inserts x into the PQ, whose priority is p •  RemoveTop() : removes the element with the highest priority •  Top() : returns the element with the highest priority

• All operations can be done quickly if implemented using a heap (if not use a sorted array)

•  priority_queue (C++), PriorityQueue (Java) • Useful for

•  Maintaining schedules / calendars •  Simulating events •  Sweepline geometric algorithms

Page 17: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

HEAP • Complete binary tree with the heap property:

•  The value of a node ≥ values of its children •  What is the difference between full vs complete?

•  The root node has the maximum value •  Constant-time top() operation

•  Inserting/removing a node can be done in O(log n) time without breaking the heap property •  May need rearrangement of some nodes

Page 18: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

HEAP

• Start from the root, number the nodes 1, 2, . . . from left to right

• Given a node k easy to compute the indices of its parent and children •  Parent node: floor(k/2) •  Children: 2k, 2k + 1

Page 19: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Heap – Inserting a Node •  1. Make a new node in the last level, as far left as

possible •  If the last level is full, make a new one

•  2. If the new node breaks the heap property, swap with its parent node •  The new node moves up the tree, which may introduce another

conflict

• Repeat 2 until all conflicts are resolved • Running time = tree height = O(log n)

Page 20: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Heap – Deleting a Node •  1. Remove the root, and bring the last node (rightmost

node in the last level) to the root •  2. If the root breaks the heap property, look at its children

and swap it with the larger one •  Swapping can introduce another conflict

•  3 Repeat 2 until all conflicts are resolved • Running time = O(log n)

Page 21: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

BINARY SEARCH TREE (BST) •  The idea behind is that each node has, at most, two

children • A binary tree with the following property: for each node v,

•  value of v ≥ values in v ’s left subtree •  value of v < values in v ’s right subtree

Page 22: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

BST • Supports three operations

•  Insert(x) : inserts a node with value x •  Delete(x) : deletes a node with value x , if there is any •  Find(x) : returns the node with value x , if there is any

• Many extensions are possible •  Count(x) : counts the number of nodes with value less than or

equal to x •  GetNext(x) : returns the smallest node with value ≥ x

Page 23: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

BST • Simple implementation cannot guarantee efficiency

•  In worst case, tree height becomes n (which makes BST useless)

• Guaranteeing O(log n) running time per operation requires balancing of the tree (hard to implement). •  For example AVL and Red-Black trees (We will skip the details of these

balanced trees, but you should be review it.). •  What does balanced mean??

• Use the standard library implementations •  set, map (C++) •  TreeSet, TreeMap (Java)

Page 24: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

BST • Simple implementation cannot guarantee efficiency

•  In worst case, tree height becomes n (which makes BST useless)

• Guaranteeing O(log n) running time per operation requires balancing of the tree (hard to implement). •  For example AVL and Red-Black trees (We will skip the details of these

balanced trees, but you should be review it.). •  What does balanced mean??=> The heights of the two child subtrees of

anny node differ by at most one.

• Use the standard library implementations •  set, map (C++) •  TreeSet, TreeMap (Java)

Page 25: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • Why a binary tree is preferable to an array of values that

has been sorted? •  O(?) Finding a given key?

Page 26: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • Why a binary tree is preferable to an array of values that

has been sorted? •  O(log n) to find a given key => traversing BST and binary search. •  Problem is the adding of a new item.

Page 27: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Hash Tables • A key is used as an index to locate the associated value.

•  Content-based retrieval, unlike position-based retrieval. •  Hashing is the process of generating a key value. •  An ideal algorithm must distribute evenly the hash values => the

buckets will tend to fill up evenly = fast search. •  A hash bucket containing more than one value is known as a

“collision”. •  Open addressing => A simple rule to decide where to put a new item

when the desired space is already occupied. •  Chaining => We associate a linked list with each table location.

•  Hash tables are excellent dictionary data structures.

Page 28: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Hash Function • A function that takes a string and outputs a number

•  A good hash function has few collisions •  i.e. , If x != y , H(x) != H(y) with high probability

• An easy and powerful hash function is a polynomial mod some prime p. •  Consider each letter as a number (ASCII value is fine) •  H(x1 . . . xk) = x1ak−1 + x2ak−2 + . . . + xk−1a + xk (mod p)

Page 29: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Data Structures • Data structures (Own Libraries).

•  Graph. •  Lets talk about graphs later.

•  Union-Find Disjoint Sets •  Segment tree.

Page 30: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Union-Find Structure • Used to store disjoint sets

•  What is a disjoint set?

• Can support two types of operations efficiently •  Find(x) : returns the “representative” of the set that x belongs •  Union(x, y) : merges two sets that contain x and y

• Both operations can be done in (essentially) constant time • Super-short implementation! • Useful for problems involving partitioning.

•  Ex: keeping track of connected components. •  Kruskal’s algorithm (minimum spaning tree).

Page 31: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Union-Find Structure • Used to store disjoint sets

•  What is a disjoint set? => sets whose intersection is the empty set.

• Can support two types of operations efficiently •  Find(x) : returns the “representative” of the set that x belongs •  Union(x, y) : merges two sets that contain x and y

• Both operations can be done in (essentially) constant time • Super-short implementation! • Useful for problems involving partitioning.

•  Ex: keeping track of connected components. •  Kruskal’s algorithm (minimum spaning tree).

Page 32: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Union-Find Structure • Main idea: represent each set by a rooted tree

•  Every node maintains a link to its parent •  A root node is the “representative” of the corresponding set •  Example: two sets {x, y, z} and {a, b, c, d}

Page 33: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Union-Find Structure •  Find(x): follow the links from x until a node points itself

•  This can take O(n) time but we will make it faster

• Union(x, y): run Find(x) and Find(y) to find corresponding root nodes and direct one to the other.

•  If we assume that the links are stored in L[], then

Page 34: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Union-Find Structure •  In a bad case, the trees can become too deep

•  ... which slows down future operations

• Path compression makes the trees shallower every time Find() is called.

• We don’t care how a tree looks like as long as the root stays the same •  After Find(x) returns the root, backtrack to x and reroute all the

links to the root

Page 35: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • How can you implement the operation isSameSet(i,j)?

Page 36: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • How can you implement the operation isSameSet(i,j)?

•  simply calls findSet(i) and findSet(j) to check if both refer to the same representative.

Page 37: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Segment Tree • DS to efficiently answer dynamic range queries.

•  Range Minimum Query (RMQ): finding the index of the minimum element in an array given a range: [i..j]. •  Ex. RMQ(1, 3) = 2, RMQ(3, 4) = 4, RMQ(0, 0) = 0, RMQ(0, 1) = 1, and

RMQ(0, 6) = 5. •  Iterate takes O(n), let make it faster using a binary tree similar to heap,

but usually not a complete binary tree (aka segment tree).

Page 38: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Segment Tree • Binary tree. • Each node is associated with some interval of the array. • Each non-leaf node has two children whose associated

intervals are disjoint. • Each child’s interval has approximately half the size of the

parent’s interval.

Page 39: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Segment Tree

• Root => [0, N – 1] and for each segment [l,r] we split them into [l, (l + r) / 2] and [(l + r) / 2 + 1, r] until l = r.

Page 40: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • What is the complexity of built_segment_tree O(?)? • With segment tree ready, what is the complexity of

answering an RMQ? • Can you give the worst case? RMQ(?,?)

Page 41: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • What is the complexity of built_segment_tree O(n)

•  There are total 2n-1 nodes.

• With segment tree ready, what is the complexity of answering an RMQ => O(log n) (2 root-to-leaf paths) •  Ex RMQ(4,6) = blue line. •  Ex RMQ(1,3) = red line. •  Ex RMQ(3,4) = worst case => one path from [0,6] to [3,3] and

another from [0,6] to [4,4].

Page 42: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Segment Tree •  If the array A is static, then use a Dynamic Programming

solution that requires O(nlogn) pre-processing and O(1) per RMQ. •  Segment tree becomes useful if array A is frequently updated.

•  Ex. Updating A[5] takes O(logn) vs O(nlogn) required by DP.

Page 43: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Fenwick Tree •  Full binary tree with at least n leaf nodes

•  We will use n = 8 for our example

•  kth leaf node stores the value of item k • Each internal node stores the sum of values of its children

•  e.g. , Red node stores item[5] + item[6]

Page 44: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Main idea: choose the minimal set of nodes whose sum

gives the desired value •  at most 1 node is chosen at each level so that the total number of

nodes we look at is log2 n •  and this can be done in O(log n) time

Page 45: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Sum(7) = sum of the values of gold-colored nodes.

Page 46: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Sum(8) = sum of the values of gold-colored nodes.

Page 47: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Sum(6) = sum of the values of gold-colored nodes.

Page 48: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Sum(3) = sum of the values of gold-colored nodes.

Page 49: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Summing Consecutive Values • Say we want to compute Sum(k)

•  Maintain a pointer P which initially points at leaf k •  Climb the tree using the following procedure:

•  If P is pointing to a left child of some node: •  Add the value of P •  Set P to the parent node of P’s left neighbor •  If P has no left neighbor, terminate

•  Otherwise: •  Set P to the parent node of P

• Use an array to implement

Page 50: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Updating a Value • Say we want to do Set(k, x) (set the value of leaf k as x)

•  1. Start at leaf k, change its value to x •  2. Go to its parent, and recompute its value •  3. Repeat 2 until the root

Page 51: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SORTING • Practical applications in computing require things to be in

order. •  To consider:

•  Runtime. •  Memory Space.

•  In-place algorithms ??? •  Stability.

•  What happens to elements that are comparatively the same?

Page 52: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SORTING • Practical applications in computing require things to be in

order. •  To consider:

•  Runtime. •  Memory Space.

•  In-place algorithms => without creating copies of the data •  Stability.

•  What happens to elements that are comparatively the same? •  Those elements whose comparison key is the same will remain in the

same relative order after sorting as they were before sorting.

Page 53: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Bubble Sort •  To pass through the data and swap two adjacent elements

whenever the first is greater than the last. Thus, the smallest elements will “bubble” to the surface.

• O(n2). • Simple to understand and code from memory + Stable +

In-place.

Page 54: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Insertion Sort •  It seeks to sort a list one element at a time. With each

iteration, it takes the next element waiting to be sorted, and adds it, in proper location, to those elements that have already been sorted.

• O(n2). •  it works very efficiently for lists that are nearly sorted initially

Page 55: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Merge Sort • A merge sort works recursively (divide and conquer).

Divide the unsorted list into n sublists, each containing 1 element. Then, merge sublists to produce new sorted sublists.

• O(n log n). •  Faily efficient + can be used to solve other problems.

Page 56: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Heap Sort • All data from a list is inserted into a heap, and then the

root element is repeatedly removed and stored back into the list.

• O(nlogn) • Not stable

Page 57: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Quick Sort • Divide the data into two groups of “high” values and “low”

values. Then, recursively process the two halves. Finally, reassemble the now sorted list.

• O(n2) •  dependent upon how successfully an accurate midpoint value is selected

Page 58: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Radix Sort • Sort data without having to directly compare elements to

each other. It groups keys by the individual digits which share the same significant position and value.

• O(n * k), where k is the size of the key. • Some types of data may use very long keys, or may not

easily lend itself to a representation that can be processed

Page 59: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Sorting Libraries •  Java API, and C++ STL all provide some built-in sorting

capabilities. • Check the interface called Comparable => you add a

method int CompareTo (object other), which returns a negative value if less than, 0 if equal to, or a positive value if greater than the parameter.

• Also check the interface called Comparator. which defines a single method int Compare (object obj1, object obj2), which returns a value indicating the results of comparing the two parameters.

Page 60: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

STRINGS

Page 61: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

String Matching Problem • Given a text T and a pattern P, find all occurrences of P

within T • Notations:

•  n and m : lengths of P and T •  : set of alphabets (of constant size) •  Pi : i th letter of P (1-indexed) •  a , b , c : single letters in •  x , y , z : strings

Page 62: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

String Matching Problem •  T = AGCATGCTGCAGTCATGCTTAGGCTA • P = GCT • P appears three times in T • A naive method takes O(mn) time

•  Initiate string comparison at every starting point •  Each comparison takes O(m) time

• We can do much better!

Page 63: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

String Matching Problem - Hash • Main idea: preprocess T to speedup queries

•  Hash every substring of length k •  k is a small constant

•  For each query P, hash the first k letters of P to retrieve all the occurrences of it within T

• Don’t forget to check collisions!

Page 64: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

String Matching Problem - Hash • Pros:

•  Easy to implement •  Significant speedup in practice

• Cons: •  Doesn’t help the asymptotic efficiency

•  Can still take O(nm) time if hashing is terrible or data is difficult •  Can you give me an example of the worst case?

• A lot of memory consumption

Page 65: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

String Matching Problem - Hash • Pros:

•  Easy to implement •  Significant speedup in practice

• Cons: •  Doesn’t help the asymptotic efficiency

•  Can still take O(nm) time if hashing is terrible or data is difficult •  Can you give me an example of the worst case? => When all the

characters of pattern and text are same. T=AAAAAAA… P=AAA.

• A lot of memory consumption

Page 66: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • A linear time (!) algorithm that solves the string matching

problem by preprocessing P in O(m) time •  Main idea is to skip some comparisons by using the previous

comparison result.

• Uses an auxiliary array π that is defined as the following: •  π[i] is the largest integer smaller than i such that P1 . . . Pπ [i] is a

suffix of P1 . . . Pi •  e.g., π[6] = 4 since abab is a suffix of ababab •  e.g., π[9] = 0 since no prefix of length ≤ 8 ends with c

Page 67: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Question for you • Why is π useful?

Page 68: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) •  T = ABC ABCDAB ABCDABCDABDE • P = ABCDABD • π = (0,0,0,0,1,2,0) • Start matching at the first position of T:

• Mismatch at the 4th letter of P!

Page 69: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • We matched k = 3 letters so far, and π[k] = 0

•  Thus, there is no point in starting the comparison at T2 , T3

• Shift P by k − π[k] = 3 letters

• Mismatch at the 4th letter of P!

Page 70: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • We matched k = 0 letters so far • Shift P by k−π[k] = 1 letter (we define π[0] = −1)

• Mismatch at T11!

Page 71: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • π[6] = 2 means P1P2 is a suffix of P1 . . . P6

• Shift P by 6 − π[6] = 4 letters

• Again, no point in shifting P by 1, 2, or 3 letters

Page 72: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • Mismatch at T11 again!

• Currently 2 letters are matched • Shift P by 2 − π[2] = 2 letters

Page 73: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • Mismatch at T11 again!

• Currently no letters are matched • Shift P by 0 − π[0] = 1 letter

Page 74: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • Mismatch at T18

• Currently 6 letters are matched • Shift P by 6 − π[6] = 4 letters

Page 75: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) •  Finally, there it is!

• Currently all 7 letters are matched • After recording this match (at T16 . . . T22, we shift P

again in order to find other matches •  Shift by 7 − π[7] = 7 letters

Page 76: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • Computing π. • Obs1=> if P1 . . . Pπ[i] is a suffix of P1 . . . Pi, then P1 . . .

Pπ[i]-1 is a suffix of P1 . . . Pi−1

• Obs2 => all the prefixes of P that are a suffix of P1 . . . Pi can be obtained by recursively applying to I •  e.g. , P1 . . . Pπ[i] , P1 . . . , Pπ[π[i]] , P1 . . . , , Pπ[π[π[i]]] are all suffixes

of P1 . . . Pi

Page 77: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) • Computing π. • Obs3 (not obvious) =>

•  First, let’s write π(k)[i] as π[.] applied k times to I •  e.g., π(2)[i] = π[π[i]]

•  π[i] is equal to π(k)[i − 1] + 1 , where k is the smallest integer that satisfies Pπ(k)[i−1]+1 = Pi •  If there is no such k, [i] = 0

•  Intuition: we look at all the prefixes of P that are suffixes of P1 . . . Pi−1, and find the longest one whose next letter matches Pi

Page 78: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) •  Implementation π.

Page 79: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP - Knuth-Morris-Pratt (KMP) •  Implementation KMP.

Page 80: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie • Suffix trie of a string T is a rooted tree that stores all the

suffixes (thus all the substrings) • Each node corresponds to some substring of T • Each edge is associated with an alphabet •  For each node that corresponds to ax, there is a special

pointer called suffix link that leads to the node corresponding to x

• Surprisingly easy to implement!

Page 81: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 82: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie • Given the suffix tree for T1 . . . Tn

•  Then we append Tn+1 = a to T , creating necessary nodes

• Start at node u corresponding to T1 . . . Tn •  Create an a -transition to a new node v

•  Take the suffix link at u to go to u′, corresponding to T2 . . . Tn •  Create an a -transition to a new node v′ •  Create a suffix link from v to v′

Page 83: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie • Repeat the previous process:

•  Take the suffix link at the current node •  Make a new a-transition there •  Create the suffix link from the previous node

• Stop if the node already has an a-transition •  Because from this point, all nodes that are reachable via suffix links

already have an a -transition

Page 84: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie • Given the suffix trie for aba, we want to add a new letter c

Page 85: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 86: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 87: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 88: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 89: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie

Page 90: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix trie •  To find P, start at the root and keep following edges

labeled with P1, P2, etc. • Got stuck? Then P doesn’t exist in T

Page 91: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP-Suffix Array

Page 92: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

SMP – Suffix Array • Memory usage is O(n) • Has the same computational power as suffix trie • Can be constructed in O(n) time (!)

•  But it’s hard to implement

Page 93: DATA STRUCTURES + SORTING + STRINGdbecer/courses/Fall2016/321/Lecture2.pdf · SORTING + STRING COMP 321 – McGill University These slides are mainly compiled from the following ...

Notes • Always be aware of the null-terminators • Simple hash works so well in many problems •  If a problem involves rotations of some string, consider

concatenating it with itself and see if it helps •  It is a smart idea to have the implementation of suffix

arrays and KMP in your notebook.