Koofer(2)

B.B. Karki, LSU0.1CSC 3102

Algorithm Design Techniques

Brute force Divide-and-conquer Decrease-and-conquer Transform-and-conquer Space-and-time tradeoffs Dynamic programming

Greedy techniques


Greedy Techniques


Basics

Constructing a solution to an optimization problem through a sequence ofsteps, each expanding a partially constructed solution obtained so far, until acomplete solution to the problem is reached. On each step, the choice made must be feasible, locally optimal and inrevocable.

Examples: Constructing a minimum spanning tree (MST) of a weighted connected graph

Grows a MST through a greedy inclusion of the nearest vertex or shortestedge to the tree under construction

Prim’s and Kruskal’s algorithms Solving the single-source shortest-path problem

Finds shortest paths from a given vertex (the source) to all other vertices.Dijkstra’s algorithm

Huffman tree and codeA binary tree that minimizes the weighted path length from the root to the

leaves containing a set of predefined weightsAn optimal prefix-free variable-length encoding scheme.


Minimum Spanning Tree


MST: Problem Statement Given n points, connect them in the cheapest

possible way so that there will be a path betweenevery pair of points Graph representation, e.g., a network Solve the minimum spanning tree problem.

A spanning tree of a connected graph is itsconnected acyclic subgraph (a tree) that containsall its vertices.

A minimum spanning tree of a weighted connectedgraph is its spanning tree of the smallest weight. The weight of a tree is the sum of the weights

on all its edges Sum of the lengths of all edges

If the edge weights are unique, then there willbe only one minimum spanning tree otherwisemore than one MST exist.

a b

d

5

3

29

1

7

A complete graph on four vertices

Many spanning trees are possible

a

c

b

d3

2

1

A minimum spanningtree with W = 6

a

c

b

d

5

3

1 a b

d

9

1

7

W = 9 W = 17

c

c


Constructing a MST

Exhaustive search approach: List all spanning trees and find the one withthe minimum weight from the list The number of spanning trees grows exponentially with the graph size

Efficient algorithms for finding a MST for a connected weighted graph Prim’s algorithm (R.C. Prim, 1957)

Constructs a MST one vertex at a time by including the nearest vertex tothe vertices already in the tree

Kruskal’s algorithm (J.B. Kruskal, 1956)Constructs a MST one edge at a time by selecting edges in increasing

order of their weights provided that the inclusion does not create a cycle.


Prim’s Algorithm

Constructs a MST through a sequence of expanding subtrees

The initial subtree (VT) in such a sequence consists of a single vertex selectedarbitrarily from the set V of the n vertices of the graph

On each iteration, we expand the current tree (VT) by simply attaching to it thenearest vertex not in that tree Find the smallest edge connecting VT to V - VT

The algorithm stops after all the graph’s vertices have been included in thetree being constructed The total number of iteration is n - 1, since exactly one vertex is added to the tree

at each iteration

The MST is then defined by the set of edges used for the tree expansions.


Pseudocode Each vertex (u) not in the current tree (VT) needs

information about the shortest edge connecting itto a tree vertex (v)

To find a vertex u* with the smallest weightlabel in the V - VTAttach two labels to each non-tree vertex u Name of the nearest tree vertex and weight of the

corresponding edge For vertices that are not adjacent to any tree

vertex, the name label is null and the weight isinfinity

Split non-tree (u) vertices into two sets Fringe: u’s are adjacent to, at least, one tree vertex Unseen: u’s are yet to be affected by the

algorithm.

Algorithm Prim(G)// Input: weighted connected// graph G = 〈V, E〉// Output: ET , the set of edges// composing a MST of G.VT ← {v0}ET ← Øfor i ← 1 to |V | -1 do find a minimum weight edge e* = (v*, u*) among all edges (v, u) such that v is in VT and u is in V - VT VT ← VT ∪ {u*} ET ← ET ∪ {e*}Return ET

Two operations after finding vertex u* to be added to the tree VT Move u* from the set V - VT to the minimum spanning tree VT For each remaining vertex u in V - VT that is connected by a shorter edge than u’s current

distance label, update its labels by u* and weight of the edge between u* and u.


Example Tree vertices and remaining vertices. Selected

vertex on each iteration is shown in bold. Thelabels indicate the nearest tree vertex and edgeweight.

a(-, -)b(a, 3) c(-, ∞) d(-, ∞) e(a, 6) f(a, 5)

b(a, 3)c(b, 1) d(-, ∞) e(a, 6) f(b, 4)

c(b, 1)d(c, 6) e(a, 6) f(b, 4)

f(b, 4)d(f, 5) e(f, 2)

e(f, 2)d(f, 5)

d(f, 5)

3

5

4

6

1

4

5

6

82

b c

fa d

e

3 4

1

5

2

b c

fa d

eMST with the minimum total weight of 15


Correctness Correctness can be proved by induction:

T0 consisting of a single vertex must be apart of any MSTFor ith inductive step, assume that Ti-1 is part ofsome T and then prove that Ti, generated fromTi-1 is also a part of MST

Proof by contradiction: Let e = (v, u) be the smallest edge

connecting a vertex in Ti-1 to G-Ti-1 toexpand Ti-1 to Ti

Suppose you have a tree T not containing e;then show that T is not the MST.

Because T is a spanning tree it contains a unique path from v to u, which together with edge e formsa cycle in G. This path has to include another edge f (v´, u´) connecting Ti-1 to G-Ti-1 T+e-f is another spanning tree, with a smaller weight than T as e has smaller weight than f So T was not minimum, which is what we wanted to prove.

v

v´ u´

ue

Ti-1 G-Ti-1

Graph G

f


Efficiency

Efficiency depends on the data structures chosen for the graph itself and thepriority queue of the set V- VT whose vertex priorities are the distances(edge weights) to the nearest tree vertices.

For a graph represented by its weight (adjacency) matrix and the priorityqueue implemented as an unordered array, the running time is Θ (|V|2)

The priority queue implemented with a min-heap data structure: A complete binary tree in which every element is less than or equal to

its children. The root contains the smallest element. Deletion of smallest element and insertion of a new element in a min-

heap of size n are O(log n) operations, and so is the operation ofchanging an element’s priority.

For a graph represented by its adjacency linked lists and the priority queueimplemented as a min-heap, the running time is O(|E| log |V|).


Pseudocode with Min-Heap

Use a min-heap toremember, for each vertex,the smallest edgeconnecting VT with thatvertex.

Perform |V | -1 steps inwhich we remove thesmallest element in theheap, and at most 2 |E|steps in which we examinean edge e = (v, u). Foreach of these steps, wemight replace a value onthe heap, reducing itsweight.

Algorithm PrimWithHeaps(G)VT ← {v0}ET ← Ømake a heap of values (vertex, edge, wt(edge))for i ← 1 to |V | -1 do let (u*, e*, wt(e*)) have the smallest weight in the heap remove (u*, e*, wt(e*)) from the heap add u* and e* to VT for each edge e = (u*, u) do if u is not already in VT

find value (u, f, wt(f)) in heapif wt(e) < wt(f)

replace (u, f, wt(f)) with (u, e, wt(e))return ET


Kruskal’s Algorithm

A greedy algorithm for constructing a minimum spanning tree (MST) of a weightedconnected graph. Finds an acyclic subgraph with |V| - 1 edges for which the sum of the edge weights is the

smallest. Constructs a MST as an expanding sequence of subgraphs, which are always acyclic but are

not necessarily connected until the final stage.

The algorithm begins by sorting the graph’s edges in non-decreasing order of theirweights and then scans the list adding the next edge on the list to the current subgraphprovided that the inclusion does not create a cycle.

Algorithm Kruskal(G) ET ← Ø; ecounter ← 0 k ← 0 while encounter < |V| - 1

for k ← k + 1 to n do if ET ∪ {ei,k} is acyclic

ET ← ET ∪ {ei,k}; ecounter ← ecounter + 1 return ET


Example

Sorted list of tree edges: the selectededges are shown in red.bc ef ab bf cf af df ae cd de1 2 3 4 4 5 5 6 6 8

Picking up any of the remaining edges(cf, af, ae, cd, de) will create a cycle.

For a graph of 6 vertices, only fiveedges need to be picked up.

3

5

4

6

1

4

5

6

82

b c

fa d

e

3 4

1

5

2

b c

fa d

eTotal weight = 15


Kruskal’s Algorithm - A Different View

A progression through a series of forests containing all vertices of a given graph andsome of its edges The initial forest consists of |V| trivial trees, each comprising a single vertex of the graph The final forest consists of a single tree (MST).

On each iteration (operation), the algorithm takes next edge (u, v) from the ordered listof the graph edge’s, finds the trees containing the vertices u and v, and, if these treesare not the same, unites them in a larger tree by adding the edge. This avoids a cycle.

Checking whether two vertices belong to two different trees requires an application ofthe so-called union-find algorithm The time efficiency of Kruskal’s algorithm is in O(|E| log|E|).

v

ue


Union-Find Algorithm

Kruskal’s algorithm requires a dynamic partition of some n-element set S into acollection of disjoint subsets S1, S2 …., Sk. Initialization: each disjoint subset is one-element subset, containing a different

element of S. Union-find operation: acts on the collection of n one-element subsets to give larger

subsets.

Abstract data type for the finite set: makeset(x) - creates an one-element set {x} find(x) - returns a subset containing x union(x,y) - constructs the union of disjoint subsets containing x and y.

Subset’s representative: Use one element from each of the disjoint subsets in a collection Two principal implementations

Quick find - uses an array indexed by the elements of the set and the array’svalues indicate the subset’s representatives containing those elements. Eachsubset is implemented as a linked list.

Quick union - represents each subset by a rooted tree with one element per nodeand the root’s element as the subset’s representative.


Single-Source Shortest-PathsProblem


Problem Statement For a given vertex called the source in a

weighted connected graph, find the shortestpaths to all its other vertices. Find a family of paths, each leading from the

source to a different vertex in the graph. The resulting tree is a spanning tree. A variety of applications exist:

to find shortest route between two cities. Dijkstra’s algorithm finds the shortest paths

to the graph’s vertices in order of theirdistance from a given source.Works for a graph with nonnegative

edge weights.

Different versions of the problem: Single-pair shortest-path problem Single-destination shortest-paths problem All pairs shortest-paths problem Traveling salesman problem.

3 2

4

4

b c

da e

3 7

5 9A tree representing all possibleshortest paths to four vertices,b, d, c and e from the source aof path lengths of 3, 5, 7 and 9,respectively.

If the source is different, then adifferent tree results.

7

5 6


Dijkstra’s Algorithm

Dijkstra’s algorithm works in the same way as the Prim’s algorithm does. Both construct an expanding subtree of vertices by selecting the next vertex from

the priority queue of the remaining vertices and using similar labeling. However, the priorities are computed in differently:

Dijkstra’s algorithm compares path lengths (by adding edge weights) whilePrim’s algorithm compares the edge weights as given.

The algorithm works by first finding the shortest path from the source to avertex nearest to it, then to a second nearest, and so on.

In general, before its ith iteration commences, the algorithm has alreadyidentified the shortest paths to i - 1 other vertices nearest to the source. These vertices, the source, and the edges of the shortest paths leading to them from

the source form a subtree Ti of the given graph. The next vertices nearest to the source can be found among the vertices adjacent to

the vertices of Ti. These adjacent vertices are referred to as “fringe vertices”. They are the

candidates from which the algorithm selects the next vertex to the source.


Labeling For every fringe vertex u, the algorithm computes the sum of the

distance to the nearest vertex v and the length dv of the shortestpath from the source to v, and selects the vertex with thesmallest such sum.

Each vertex has two labels. The numeric label d indicates the length of the shortest path from

the source to this vertex found by the algorithm so far When a vertex is added to the tree, d indicates the length of the

shortest path from the source to that vertex. The other label indicates the name of the next-to-last vertex on

such a path The parent of the vertex in the tree being constructed.

v*v0

u*

With such labeling, finding the next nearest vertex u* becomes a simple task of finding afringe vertex with the smallest d value.

After a vertex u* to be added to the tree is identified, perform two operations: Move u* from the fringe to the set of tree vertices. For each remaining fringe vertex u that is connected u* by an edge of weight w(u*, u) such that

du* + w(u*, u) < du, update the labels of u by u* and du* + w(u*, u), respectively.


Pseudocode

Shows explicit operations on twosets of labeled vertices: The set VT of vertices for which

a shortest path has already beenfound.

The priority queue Q of thefringe vertices.

Initialize: initialize vertexpriority queue to empty.

Insert: initialize vertex priority inthe priority queue.

Decrease: update priority of swith ds.

DeleteMin: delete the minimumpriority element.

Algorithm Dijkstra(G)// Input: weighted connected graph// G = 〈V, E〉 and its vertex s// Output: The length dv of a shortest path from// s to v, and its penultimate vertex pv// for every vertex v in V (pv is the list// of predecessors for each v )Initialize (Q)for every vertex v in V do dv ← ∞; pv ← Ø Insert (Q, v, dv )dv ← 0; Decrease (Q, s, ds )VT ← Øfor i ← 0 to |V | - 1 do u* ← DeleteMin(Q) VT ← VT ∪ {u*} for every vertex u in V - VT that is adjacent to u* do if du* + w(u*, u) < du du ← du* + w(u*, u); pu ← u* Decrease (Q, u, du )


Example Tree vertices and remaining vertices. Selected

vertex on each iteration is shown in bold. Thelabels indicate the nearest tree vertex and pathlength.

a(-, 0)b(a, 3) c(-, ∞) d(a, 7) e(-, ∞)

b(a, 3)c(b, 3+4) d(b, 3+2) e(-, ∞)

d(b, 5)c(b, 7) e(d, 5+4)

c(b, 7)e(d, 9)

e(d, 9)

3

7

2

4

5

4

6

b c

da e

3 2

4

4

b c

da e

3 7

5 9The shortest paths of four verticesfrom the source vertex a:b: a - b of length 3d: a - b - d of length 5c: a - b - c of length 7e: a - b - d - e of length 9


Correctness Correctness can be proved by induction:

For i = 1, the assertion is true for the trivial path from the source to itself.

For general step, assume that it is true for the algorithm’s tree Ti with ivertices.Let vi+1 be the vertex to be added next to the tree by the algorithm.

All vertices on a shortest path from s to vi+1 must be in Ti because they arecloser to s than vi+1.

Hence, the (i+1)st closet vertex can be selected as the algorithm does: byminimizing the sum of dv and the length of the edge from u to an adjacentvertex not in the tree. dv is the shortest path from s to v (contained in Ti) by the assumption of

induction.


Efficiency

The time efficiency of Dijkstra’s algorithm depends on the datastructures chosen for the graph itself and the priority queue.

For a graph represented by its weight matrix and the priority queueimplemented an unordered array, the running time is Θ (|V |2).

For a graph represented by its adjacency linked lists and the priorityqueue implemented as a min-heap, the running time is O(|E| log |V|).


Huffman Tree and Code


Huffman Tree and Code

Huffman trees allow us to encode a text that comprises charactersfrom some n-character alphabet

Huffman code represents an optimal prefix-free variable-lengthscheme that assigns bit strings to characters based on their frequenciesin a given text. Uses a greedy construction of binary tree whose leaves represent

the alphabet characters and whose left and right edges are labeledwith 0’s and 1’s.

Assigns shorter bits to high-frequency characters and longer onesto low-frequency characters.


Huffman’s Algorithm

Initialize n one-node trees and label them with characters of the alphabet with thefrequency of each character recorded as the weight in its tree’s root

Find two trees with the smallest weights and make them the left and rightsubtrees of a new tree and record the sum of their weights in the root of the newtree as its weight The resulting binary tree is called Huffman tree

Obtain the codeword of a character by recording the labels (0 or 1) on the simplepath from the root to the character’s leaf This is the Huffman code It provides an optimal encoding

Dynamic Huffman encoding Coding tree is updated each time a new character is read from the source text


Constructing a Huffman Coding Tree

See the text book for the following five-character alphabet {A, B, C, D, -}example:character A B C D -probability 0.35 0.1 0.2 0.2 0.15codeword 11 100 00 01 101

Expected number of bit per character is

Variance:

In the fixed-length scheme each codeword will contain three bits. So Huffman code results in compression by (3 - 2.25) = 0.75, which is 25 %.

0.2C

1.0

0.60.4

0.250.2D

0.1B

0.15-

0.35A€

l = li pi = 2.25i=1

5

∑

€

Var = (li − l )2 pi ≈ 0.19i=1

5

∑

Koofer(2)

Documents

adjacency

minimum spanning

nearest tree

current tree

binary tree

huffman tree

spanning tree

disjoint subsets