Top Banner
Probabilistic Analysis and Randomized Algorithm
31

Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis Need the knowledge of the distribution of the inputs Indicator.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Probabilistic Analysis and Randomized Algorithm

Page 2: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Worst case analysis Probabilistic analysis

Need the knowledge of the distribution of the inputs

Indicator random variables Given a sample space S and an event A, the indicator

random variable I{A} associated with event A is defined as: 10 if occurs

o/wAI A

Page 3: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

E.g.: Consider flipping a fair coin: Sample space S = { H,T } Define random variable Y with Pr{ Y=H } = Pr{ Y=T }=1/2 We can define an indicator r.v. XH associated with the

coin coming up heads, i.e. Y=H

10 if if H

Y HX I Y HY T

1 Pr 0 Pr

1Pr

2

HE X E I Y HY H Y T

Y H

Page 4: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

{ }

:

:

Pr

1 Pr 0 Pr

Pr

A

A

A

S AS X I A

E X A

E X E I A A A

A

Lemma

Proof

Given a sample space and an event in thesample space , let Then

Page 5: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Hire-Assistant(n)

1. best = 0

2. for i = 1 to n

3. interview candidate i

4. if candidate i is better than candidate best

5. best = i

6. hire candidate i

Page 6: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

1

:

:I { candidate i is hired

1/ .... 1 1/ 2

( ln )

}i

i

h

n

O c n

E X iE X X

Lemma

ProofX

Assuming that the candidate are presented in a random order, algorithmHire-Assistant has an average-case totalhiring cos

t of .

1/ 3 ... 1/ln (1).

nn O

Page 7: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Randomized-Hire-Assistant(n)

1. randomly permute the list of candidate

2. best = 0

3. for i = 1 to n

4. interview candidate i

5. if candidate i is better than candidate best

6. best = i

7. hire candidate i

Page 8: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

( ln )

:

h

Lemma

O c nThe expected hiring cost of the algorithmRandomed-Hire-Assista is nt .

Page 9: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Permute-By-Sorting(A)

1. n = A.length

2. Let P[1..n] be a new array

3. for i = 1 to n

4. P[i] = Random(1, n^3)

5. sort A, using P as sort keys

After sorting, if P[i] is the j-th smallest one, then A[i] lies in position j of the output.

Page 10: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Procedure Permute - By -Sorting produces a uniform random permutation of the input, assuming that all entries are distinct.

:Lemma

Page 11: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Define event Ei : A[i] receives the i-th smallest element.

Pr{E1∩E2 ∩…∩En-1 ∩En} =

Pr{E1} Pr{E2|E1} Pr{E3|E1 ∩E2 } … Pr{En|E1 ∩E2 ∩…∩En-1 }

Pr{E1}=1/n, Pr{E2|E1}=1/(n-1)

Pr{Ei|E1 ∩E2 ∩…∩Ei-1 } = 1/(n-i+1)

Pr{E1∩E2 ∩…∩En-1 ∩En} = 1/n!, which is the probability of obtaining the identity permutation.

It holds for any permutation.

Page 12: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Randomize-In-Place(A): a better method

1. n = A.length

2. for i = 1 to n

3. swap A[i] with A[Random(i, n)]

Lemma: The above procedure computes a uniform random permutation.

Page 13: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

The birthday paradox: How many people must there be in a room before there

is a 50% chance that two of them born on the same day of the year?

(1) Suppose there are k people and there are n days in a y

ear,bi : i-th person’s birthday, i =1,…,k

Pr{bi=r}=1/n, for i =1,…,k and r=1,2,…,n

Pr{bi=r, bj=r}=Pr{bi=r}. Pr{bj=r} = 1/n2

Page 14: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Define event Ai : Person i’s birthday is different from per

son j’s for j < i

Pr{Bk} = Pr{Bk-1∩Ak} = Pr{Bk-1}Pr{Ak|Bk-1}where Pr{B1} = Pr{A1}=1

11

Pr Pr ,n

i j i j nrb b b r b r

1

1

: the event that people have distinct birthdayk

k ii

k k

B A k

B A

( 1)1 2

1 (1

1 1

2 1 2 1

1 2 1 3 2 11 2 1

11 2

/

Pr{ } Pr{ }Pr{ | }Pr{ }Pr{ | }Pr{ | }... Pr{ }Pr{ | }Pr{ | }...Pr{ | }1 ( )( )...( )

1 (1 )(1 )...(1 ) 1k

n n n

k k ki

k k k k

k k k k k

k kn n n kn n n

xkn n n

i n

B B A BB A B A B

B A B A B A B

e e e x e

e e

1)

2 ( 1)1 12 2 2ln( )where n k k

n

12( 1) 2 ln 2 , (1 1 (8ln 2) ) / 2

365, 23the prob.

For we have k k n k n

n k

Page 15: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

(2) Analysis using indicator random variables For each pair (i, j) of the k people in the room, define th

e indicator r.v. Xij, for 1≤ i < j ≤ k, by

10 /

ijX I i ji jo w

person and person have the same birthday and have the same birthday

1

1 1

1 1

1 1

Pr

( 1)/

2 2

person and have the same birthday

Let

ij

nk k

iji j ik k

iji j i

k k

iji j i

E X i j

X X

E X E X

k kkE X nn

Page 16: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

When k(k-1) ≥ 2n, the expected number of pairs of people with the same birthday is at least 1

2 1 1 82 0

2( ), 365 28, we expect to find at least

one matching pair

nk k n k

k n n k

Page 17: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Balls and bins problem: Randomly toss identical balls into b bins, numbered 1,2,

…,b The probability that a tossed ball lands in any given bin

is 1/b (a) How many balls fall in a given bin?

If n balls are tossed, the expected number of balls that fall in

the given bin is n/b (b) How many balls must one toss, on the average, until

a given bin contains a ball? By geometric distribution with probability 1/b

1

21 1 1 1 1

21 1 1 1 1 1

1 11 (1 )

1

1 2 (1 ) 3 (1 ) ...(1 ) (1 ) (1 ) ...

( ) 1

1b

b b b b b

b b b b b b

b

b

ee e

e e b

Page 18: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

(c) (Coupon collector’s problem) How many balls must one toss until every bin contains at least one ball?

Want know the expected number n of tosses required to get b hits

The ith stage consists of the tosses after the (i-1)st hit until the ith hit

For each toss during the ith stage, there are i-1 bins that contain balls and b-i+1 empty bins

Thus, for each toss in the ith stage, the probability of obtaining a hit is (b-i+1)/b

Let ni be the number of tosses in the ith stage. Thus the number of tosses required to get b hits is n=∑b

i=1 ni

Each ni has a geometric distribution with probability of success (b-i+1)/b → E[ni]=b/b-i+1

111 1 1 1

(ln (1)) ( ln )

b b b bbi i b i ii i i i

E n E n E n b

b b O O b b

Page 19: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Streaks

Flip a fair coin n times, what is the longest streak of consecutive heads? Ans:θ(lg n)

Let Aik be the event that a streak of heads of length at least k begins with the ith coin flip

For j=0,1,2,…,n, Let Lj be the event that the longest streak of heads has Length exactly j, and let L be the length of the longest streak.

2

2 lg 1,2 lg

Pr 1/ 22 lg

Pr 2

kik

n

i n n

Ak n

A

For

0Pr

n

jjE L j L

Page 20: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

2 lg

0,12 lg

Pr

j

n

jj n

L j nn

L

Note that the events for ,..., are disjoint.So the probability that a streak of heads of length

begins anywhere is

12 lg

2 lg 1

0 0

Pr

Pr 1. Pr 1

Thus,

while We have

n

j nj nn n

j jj j

L

L L

02 lg 1

0 2 lg2 lg 1

0 2 lg2 lg 1

0 2 lg

Pr

Pr Pr

(2 lg ) Pr Pr

2 lg Pr Pr

2 lg 1 (1/ ) (lg )

n

jjn n

j jj j nn n

j jj j nn n

j jj j n

E L j L

j L j L

n L n L

n L n L

n n n O n

Page 21: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

We look for streaks of length s by partitioning the n flips into approximately n/s groups of s flips each.

lg

, lg

1

Pr 1 2 1

1lg

The probability is that the largest streakis

r n ri r n

r r

A n

n n nr n

:

lgThe expected length of the longest streak of heads in coin flips is

nC im

n

la

Page 22: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

The probability that a streak of heads of length

does not begin in position i is

(lg ) / 2Take s n s s s

n

(lg ) / 2

, (lg ) / 2Pr 1 2 1n

i nA n

(lg ) / 2n 1 1 n

(lg ) / 2 / (lg ) / 21

(lg ) / 2

(lg ) / 2

(1 1 ) (1 )n

n n n

n

nn

n

n

The groups are mutually exclusive, ind. coin flips,

the prob. that every one of the groups fails to be a streak oflength is at most

1 2 / lg 11

2 / lg 1 / lg 1

(1 ) n n

nn n n n

ne O e O

Page 23: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

(lg ) / 2 1

(lg ) / 2

Pr 1 1/n

jj n

n

L O n

Thus, the prob. that the longest streak exceeds is

WHY?

0(lg ) / 2

0 (lg ) / 2 1

(lg ) / 2 1

(lg ) / 2 1

Pr

Pr Pr

(lg ) / 2 Pr

(lg ) / 2 Pr

(lg ) / 2 1 1/ (lg )

n

jjn n

j jj j nn

jj nn

jj n

E L j L

j L j L

n L

n L

n O n n

Page 24: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Using indicator r.v. :

Let ik ikX I A1

1Let

n k

ikiX X

1

11 1 1 1

1 1 1 2Pr 1/ 2 k

n k

ikin k n k n k k n k

ik iki i i

E X E X

E X A

lg 1 1

1

lglg 1 lg 1 1 ( lg 1) /

21

( )

If , for some constant ,

c n c c c

c

k c n cn c n n c n c n n

E Xn n n

n

Page 25: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

If c is large, the expected number of streaks of length clgn is very small.

Therefore, one streak of such a length is very likely to occur.

12

1 12

1 12

12( ) lg

If , then we obtain

and we expect that there will be a large number of streaksof length

nc E X n

n

:(lg )The length of the longest streak is

Conclusionn ■

Page 26: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

The on-line hiring problem:

To hire an assistant, an employment agency sends one candidate each day. After interviewing that person you decide to either hire that person or not. The process stops when a person is hired.

What is the trade-off between minimizing the

amount of interviewing and maximizing the quality of the candidate hired?

Page 27: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

What is the best k?

Page 28: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Let M(j) = max 1ij{score(i)}.

Let S be the event that the best-qualified applicant is chosen.

Let Si be the event the best-qualified applicant chosen is the i-th one interviewed.

Si are disjoint and we have Pr{S}= ji=1Pr{Si}.

If the best-qualified applicant is one of the first k, we have that Pr{Si}=0 and thus

Pr{S}= ji=k+1Pr{Si}.

Page 29: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

Let Bi be the event that the best-qualified applicant must be in position i.

Let Oi denote the event that none of the applicants in position k+1 through i-1 are chosen

If Si happens, then Bi and Oi must both happen.

Bi and Oi are independent! Why?

Pr{Si} = Pr{Bi Oi} = Pr{Bi} Pr{Oi}.

Clearly, Pr{Bi} = 1/n.

Pr{Oi} = k/(i-1). Why???

Thus Pr{Si} = k/(n(i-1)).

Page 30: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

i1

1

1

1

Pr{S} = Pr{S }

( 1)

1( / )

( 1)

1( / )

n

i k

n

i k

n

i k

n

i k

kn i

k ni

k ni

Page 31: Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.

1

1

1

Differentiate

1 1

(ln ln ) Pr{ } (ln( 1) l

(ln ln )with respect to k.

1We have (ln ln 1) 0.

Thus / and Pr{ } 1

n( 1

/

).

.

)

1n n

k k

n

i k

k n kn

n k

dx dxx x

k kn k S n kn n

nk n e S e

i