Top Banner
Instructor: Shengyu Zhang 1
39

Instructor: Shengyu Zhang 1. Content Two problems Minimum Spanning Tree Huffman encoding One approach: greedy algorithms 2.

Jan 14, 2016

Download

Documents

Ryleigh Copley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Instructor: Shengyu Zhang

1

Page 2: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Content

Two problems

Minimum Spanning Tree

Huffman encoding

One approach: greedy algorithms

2

Page 3: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Example 1: Minimum Spanning Tree

3

Page 4: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

MST: Problem and Motivation Suppose we have computers,

connected by wires as given in the graph.

Each wire has a renting cost. We want to select some wires,

such that all computers are connected (i.e. every two can communicate).

Algorithmic question: How to select a subset of wires with the minimum renting cost?

Answer to this graph?

4 1

4

35

42

2

2

3

3

26

4

Page 5: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

More precisely

Given a weighted graph , we want a subgraph , s.t. all vertices are connected on G’. total weight is minimized.

Observation: The answer is a tree. Tree: connected graph without cycle

Spanning tree: a tree containing all vertices in .

Question: Find a spanning tree with minimum weight. The problem is thus called Minimum

Spanning Tree (MST).

4 1

4

35

42

2

2

3

3

26

5

Page 6: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

MST: The abstract problem

Input: A connected weighted graph

Output: A spanning tree with min total weight. A spanning tree whose

weight is the minimum of that of all spanning trees.

Any algorithm?

4 1

4

35

42

2

2

3

3

26

6

Page 7: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Methodology 4: Starting from a naïve solution See whether it works well enough If not, try to improve it.

A first attempt may not be correct But that’s fine. The key is that it’ll give you a

chance to understand the problem.

7

Page 8: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

What if I’m really stingy?

I’ll first pick the cheapest edge. I’ll then again pick the cheapest

one in the remaining edges I’ll just keep doing like this …

as long as no cycle caused … until a cycle is unavoidable.

Then I’ve got a spanning tree! No cycle. Connected: Otherwise I can

still pick something without causing a cycle.

Concern: Is there a better spanning tree?

6 1

5

46

54

2

4

3

4

26

8

Page 9: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Kruskal's Algorithm

What we did just now is Kruskal’s algorithm.

Repeatedly add the next lightest edge that doesn't produce a cycle… in case of a tie, break it arbitrarily.

…until finally reaching a tree --- that’s the answer!

9

Page 10: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Illustrate an execution of the algorithm At first all vertices are all

separated. Little by little, they merge

into groups. Groups merge into larger

groups. Finally, all groups merge into

one. That’s the spanning tree

outputted by the algorithm.

6 1

5

46

54

2

4

3

4

26

10

Page 11: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Correctness: prove by induction Proof plan: We will use induction to prove

that at any point of time, the edges found are part of an MST.

At any point of time, we’ve found some edges , connects vertices into groups .

By induction, belongs to some MST .

11

𝐺1 𝐺2

Page 12: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Correctness: prove by induction Suppose Kruskal’s algorithm picks in the

next step, connecting, say, and . If , done. If , adding into would produce a

cycle. The cycle must cross the cut via at least

one other edge . Since is the lightest one among all

crossing edges, . Let , then . is also a spanning tree.

Connected, and has edges. So is also an MST. Induction step done.

𝑒

𝑒 ′

12

𝐺1

5

𝐺2

Page 13: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Implementing Kruskal's Algorithm: Initialization:

Sort the edges by weight create for each

for all edges , in increasing order of weight: if adding doesn’t cause a cycle

add edge to

Question: What’s not clearly specified yet?

13

Page 14: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Implementation

What do we need? We need to maintain a collection of groups

Each group is a subset of vertices Different subsets are disjoint.

For a pair , we want to know whether adding this edge causes a cycle If and are in the same subset now, then adding will cause

a cycle. Also true conversely. So we need to find the two subsets containing and , resp.

If no cycle is caused, then we merge the two sets containing and .

14

Page 15: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Data structure

Union-Find data structure for disjoint sets find: to which set does belong? union: merge the sets containing and .

Using this terminology, let’s re-write the algorithm and analyze the complexity…

15

Page 16: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Kruskal's Algorithm: rewritten, complexity Initialization:

Sort the edges by weight - create for each - -

for all edges , in increasing order of weight: if find ≠ find - 2*cost-of-find

add edge to - union - cost-of-union

How many finds?

How many unions?

Total: find-cost union-cost

16

Page 17: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

data structure for union-find

We have used various data structures: queue, stack, tree.

Rooted Tree is good here It’s efficient: have/cover leaves with only depth

where is the number of children of each node. Each tree has a natural id: the root

We now use a tree for each connected component. find: return the root

So cost-of-find depends on height(tree). Want: small height. union: somehow make the two trees into one

The union cost … depends on implementation

17

Page 18: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

union

Recall: a tree is constructed by a sequence of union operations.

So we want to design a union algorithm s.t. the resulting tree is short the cost of union itself is not large either.

A natural idea: let the shorter tree be part of the higher tree Actually right under the root of the higher tree

To this end, we need to maintain the height information of a tree, which is pretty easy.

18

Page 19: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Details for union():

find find if

if

𝑥

𝑦

𝑟 𝑥

𝑟 𝑦

19

Page 20: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

How good is this?

How high will the resulting tree be? [Claim] Any node of height has a subtree of size at least .

Height of node : height of the subtree under . size: # of nodes Proof: Induction on . The height increases (by 1) only when two trees of equal height

merge. By induction, each tree has size , now the new tree has size .

Done. Thus the height of a tree at any point is never more than .

So the cost of find is at most . And thus the cost of union is also

20

Page 21: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Cost of union?

- - if

- else

- if

- Total cost of union: . Total cost of Kruskal's algorithm:

find-cost union-cost .

21

Page 22: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Don’t confuse the two types of trees Type 1: (parts of) the

spanning tree Red edges

Type 2: the tree data structure used for implementing union-find operations Blue edges

6 1

5

46

54

2

4

3

4

26

22

Page 23: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Question?

Next: another MST algorithm.

23

Page 24: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Next: another MST algorithm

In Kruskal’s algorithm, we get the spanning tree by merging smaller trees.

Next, we’ll present an algorithm that always maintains one tree through the process.

The size of the tree will grow from 1 to . The whole algorithm is reminiscent of

Dijkstra’s algorithm for shortest paths.

24

Page 25: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Execution on the same example We first pick an arbitrary vertex to start with.

Maintain a set . Over all edges from , find a lightest one. Say

it’s .

Over all edges from (to ), find a lightest one, say .

… In general, suppose we already have the

subset , then over all edges from to , find a lightest one .

Update: … Finally we get a tree. That’s the

answer.

6 1

5

46

54

2

4

3

4

26

v1

v2

v3

v4v5

v6

v7

v8

v9

25

Page 26: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Key property

Currently we have the set . We want to main the following

property: The edges picked form a tree in The tree is part of a correct MST .

When adding one more node from to , we want to keep the property.

Question: Which node to add? Recall Methodology 2: Good

properties often happen at extremal points.

Finally, , thus the property implies that our final tree is a correct MST for .

6

6

4

6

v1

v2

v3

v4v5

𝑆𝑉 −𝑆

26

Page 27: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Key property: is part of a MST . Consider all edges from to : We pick

the lightest one (and add the end point in to ).

Will show: is part of some MST. By induction, Ǝ a MST containing . If contains , done. Else: adding into produces a cycle. The cycle has some other edge(s)

crossing and . Replacing with :

Removing any edge in the cycle makes it still a spanner tree.

is only better:

6

6

46

𝑆𝑉 −𝑆

𝑒

𝑒 ′

27

Page 28: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Prim’s algorithm

Implementation: Very similar to Dijkstra’s algorithm.

Now the cost function for a vertex in is the minimal weight over all . Details omitted; see textbook.

Complexity: also if we use binary min-heap as before. if Fibonacci heap is used.

28

Page 29: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Extra: Divide and Conquer?

Consider the following algorithm: Divide the graph into two

balanced parts. About each.

Find a lightest crossing edge

Recursively solve the two subgraphs.

Is this correct?

6

6

46

𝑆𝑉 −𝑆

29

Page 30: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Example 2: Huffman code

30

Page 31: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Huffman encoding

Suppose that we have a sequence of symbols . Each comes from an alphabet of size .

e.g. , . The symbols in appear in different frequencies

. : the number of times appears in . In earlier example: .

Goal: encode symbols in s.t. the sequence has the shortest length.

31

Page 32: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Example

. . Naive encoding:

Number of bits: . Consider this:

Number of bits:

32

Page 33: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Requirement for the code

The length can be variable: different symbols can have codeword with different lengths.

Prefix free: no codeword can be a prefix of another codeword.

Otherwise, say if the codewords are

then is ambiguous It can be either or .

Question: How to construct an optimal prefix-free code?

33

Page 34: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Prefix-free code and binary tree Optimal prefix-free code

a full binary tree. Full: each internal node has

two children. symbol leaf. Encoding : the path from

root to the node for Decoding:

Follow path to get symbol. Return to the root.

A

B

C D

0

0

0

1

1

1

Path: represented by sequence of 0’s and 1’s.

0: left branch. 1: right branch

𝐴→ 0 ,𝐵→11 ,𝐶→ 100 ,𝐷→ 101

34

Page 35: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Optimal tree?

Recall question: construct an optimal code. Optimal: the total length for is minimized.

New question: How to construct an optimal tree .

Namely, find , where

Recall Methodology 3: Analyze properties of an optimal solution.

35

Page 36: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

In an optimal tree

[Fact] The two symbols with the smallest frequencies are at the bottom, as children of the lowest internal node. Otherwise, say isn’t, then switch it and whoever is

at the bottom. This would decrease the cost. This suggests a greedy algorithm:

Find with the smallest frequencies. Add a node , as the parent of . Remove and add with frequency . Repeat the above until a tree with leaves is formed.

36

Page 37: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Algorithm, formal description

Input: An array of frequencies Output: An encoding tree with leaves

let be a priority queue of integers, ordered by for to

insert for to

delete-min(); delete-min() create a node numbered with children

insert

37

Page 38: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

On the running example…

. . . .

Final cost: Also:

Including both leaves and internal nodes, but not root.

A:20

B:10

C:5 D:5

0

0

0

1

1

110

20

40

38

Page 39: Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.

Summary

We give two examples for greedy algorithms. MST, Huffman code

General idea: Make choice which is the best at the moment only. without worrying about long-term consequences.

An intriguing question: When greedy algorithms work? Namely, when there is no need to think ahead?

Matroid theory provides one explanation. See CLRS book (Chapter 16.4) for a gentle intro.

39