Top Banner
Advanced Algorithms – COMS31900 Probability recap. (based on slides by Markus Jalsenius) Benjamin Sach
110

Probability Recap

Apr 13, 2017

Download

Education

Benjamin Sach
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probability Recap

Advanced Algorithms – COMS31900

Probability recap.

(based on slides by Markus Jalsenius)

Benjamin Sach

Page 2: Probability Recap

Randomness and probability

Page 3: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Page 4: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLES

Page 5: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLES

Flip a coin: S = {H, T}.

Page 6: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLES

Flip a coin: S = {H, T}.Amount of money you can win when playing some lottery:

S = {£0,£10,£100,£1000,£10, 000,£100, 000}.

Page 7: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLES

Flip a coin: S = {H, T}.Amount of money you can win when playing some lottery:

S = {£0,£10,£100,£1000,£10, 000,£100, 000}.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Page 8: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLES

Flip a coin: S = {H, T}.Amount of money you can win when playing some lottery:

S = {£0,£10,£100,£1000,£10, 000,£100, 000}.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Page 9: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Page 10: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Roll a die: S = {1, 2, 3, 4, 5, 6}.EXAMPLE

Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = 16 .

Page 11: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Page 12: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Flip a coin: S = {H, T}.

Pr(H) = Pr(T) = 12 .

EXAMPLE

Page 13: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Page 14: Probability Recap

Probability

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

Pr is ‘just’ a function which maps each x ∈ S to Pr(x) ∈ [0, 1]

Amount of money you can win when playing some lottery:

Pr(£0) = 0.9, Pr(£10) = 0.08, . . . , Pr(£100, 000) = 0.0001.

EXAMPLE

S = {£0,£10,£100,£1000,£10, 000,£100, 000}.

Page 15: Probability Recap

Probability

The sample space is not necessarily finite.

Page 16: Probability Recap

Probability

The sample space is not necessarily finite.

EXAMPLE

Flip a coin until first tail shows up

Page 17: Probability Recap

Probability

The sample space is not necessarily finite.

Flip a coin until first tail shows up:

EXAMPLE

S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

Page 18: Probability Recap

Probability

The sample space is not necessarily finite.

Flip a coin until first tail shows up:

Pr(“It takes n coin flips”) =(12

)n, and

EXAMPLE

S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

Page 19: Probability Recap

Probability

The sample space is not necessarily finite.

Flip a coin until first tail shows up:

Pr(“It takes n coin flips”) =(12

)n, and

EXAMPLE

S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

∑∞n=1

(12

)n

Page 20: Probability Recap

Probability

The sample space is not necessarily finite.

Flip a coin until first tail shows up:

Pr(“It takes n coin flips”) =(12

)n, and

EXAMPLE

S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

∑∞n=1

(12

)n= 1

2 + 14 + 1

8 + 116 . . .

Page 21: Probability Recap

Probability

The sample space is not necessarily finite.

Flip a coin until first tail shows up:

Pr(“It takes n coin flips”) =(12

)n, and

EXAMPLE

S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.

∑∞n=1

(12

)n= 1

2 + 14 + 1

8 + 116 . . . = 1

Page 22: Probability Recap

Event

An event is a subset V of the sample space S.

Page 23: Probability Recap

Event

An event is a subset V of the sample space S.

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Page 24: Probability Recap

Event

An event is a subset V of the sample space S.

EXAMPLE

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}For each x ∈ S, Pr(x) = 1

8

Page 25: Probability Recap

Event

An event is a subset V of the sample space S.

EXAMPLE

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}For each x ∈ S, Pr(x) = 1

8

Define V to be the event “the first and last coin flips are the same”

Page 26: Probability Recap

Event

An event is a subset V of the sample space S.

EXAMPLE

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}For each x ∈ S, Pr(x) = 1

8

Define V to be the event “the first and last coin flips are the same”in other words, V = {HHH, HTH, THT, TTT}

Page 27: Probability Recap

Event

An event is a subset V of the sample space S.

EXAMPLE

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}For each x ∈ S, Pr(x) = 1

8

Define V to be the event “the first and last coin flips are the same”in other words, V = {HHH, HTH, THT, TTT}

What is Pr(V )?

Page 28: Probability Recap

Event

An event is a subset V of the sample space S.

Pr(V ) = Pr(HHH) + Pr(HTH) + Pr(THT) + Pr(TTT) = 4× 18 = 1

2 .

EXAMPLE

The probability of event V happening, denoted Pr(V ), is

Pr(V ) =∑x∈V

Pr(x).

Flip a coin 3 times: S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}For each x ∈ S, Pr(x) = 1

8

Define V to be the event “the first and last coin flips are the same”in other words, V = {HHH, HTH, THT, TTT}

What is Pr(V )?

Page 29: Probability Recap

Random variable

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Page 30: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

Page 31: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

Page 32: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

Page 33: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

What is Pr(Y = 2)?

Page 34: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

What is Pr(Y = 2)?

Pr(Y = 2) =∑

x∈{HH,TT}Pr(x) =

1

4+1

4=

1

2

Page 35: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

What is Pr(Y = 2)?

Pr(Y = 2) =∑

x∈{HH,TT}Pr(x) =

1

4+1

4=

1

2

Page 36: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

What is Pr(Y = 2)?

Pr(Y = 2) =∑

x∈{HH,TT}Pr(x) =

1

4+1

4=

1

2

Page 37: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

Pr(Y = 2) = 12

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

sum over all values of x such that Y (x) = y

What is Pr(Y = 2)?

Pr(Y = 2) =∑

x∈{HH,TT}Pr(x) =

1

4+1

4=

1

2

Page 38: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

Pr(Y = 2) = 12

EXAMPLE

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

Page 39: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

Pr(Y = 2) = 12

The expected value (the mean) of a r.v. Y ,

EXAMPLE

denoted E(Y ), is

E(Y ) =∑x∈S

Y (x)·Pr(x).

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

Page 40: Probability Recap

Random variable

The probability of Y taking value y is

{x ∈ S st. Y(x) = y}

Two coin flips.

H H

H T

T H

T T

2

1

5

2

S Y

Pr(Y = 2) = 12

The expected value (the mean) of a r.v. Y ,

E(Y ) =(2 · 12

)+(1 · 14

)+(5 · 14

)= 7

2

EXAMPLE

denoted E(Y ), is

E(Y ) =∑x∈S

Y (x)·Pr(x).

A random variable (r.v.) Y over sample space S is a function S → Ri.e. it maps each outcome x ∈ S to some real number Y (x).

Pr(Y = y) =∑

Pr(x).

Page 41: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

THEOREM (Linearity of expectation)

Page 42: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

(regardless of whether the random variables are independent or not.)

Page 43: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.

Page 44: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.

random variable

Page 45: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.

Page 46: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Page 47: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

Page 48: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

Page 49: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

E(Y ) =∑x∈S Y (x) · Pr(x) = 1

36

∑x∈S Y (x) =

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

Page 50: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

E(Y ) =∑x∈S Y (x) · Pr(x) = 1

36

∑x∈S Y (x) =

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

Page 51: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

E(Y ) =∑x∈S Y (x) · Pr(x) = 1

36

∑x∈S Y (x) =

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

136 (1 · 2 + 2 · 3 + 3 · 4 + · · ·+ 1 · 12) = 7

Page 52: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

E(Y ) =∑x∈S Y (x) · Pr(x) = 1

36

∑x∈S Y (x) =

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 1: (without the theorem)

The sample space S = {(1, 1), (1, 2), (1, 3) . . . (6, 6)} (36 outcomes)

136 (1 · 2 + 2 · 3 + 3 · 4 + · · ·+ 1 · 12) = 7

Page 53: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Page 54: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 2: (with the theorem)

Page 55: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 2: (with the theorem)

Let the r.v. Y1 be the value of the first die and Y2 the value of the second

Page 56: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 2: (with the theorem)

Let the r.v. Y1 be the value of the first die and Y2 the value of the second

E(Y1) = E(Y2) = 3.5

Page 57: Probability Recap

Linearity of expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Linearity of expectation always holds,

THEOREM (Linearity of expectation)

EXAMPLE

(regardless of whether the random variables are independent or not.)

Roll two dice. Let the r.v. Y be the sum of the values.What is E(Y )?

Approach 2: (with the theorem)

Let the r.v. Y1 be the value of the first die and Y2 the value of the second

E(Y1) = E(Y2) = 3.5

so E(Y ) = E(Y1 + Y2) = E(Y1) + E(Y2) = 7

Page 58: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

(usually referred to by the letter I )

Page 59: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

(usually referred to by the letter I )

Fact: E(I) = 0 · Pr(I = 0) + 1 · Pr(I = 1) = Pr(I = 1).

Page 60: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

Page 61: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Often an indicator r.v. I is associated with an event such that

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

Page 62: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

Page 63: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

Page 64: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

Page 65: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

Page 66: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

Page 67: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

E( n∑j=2

Ij

)=

n∑j=2

E(Ij) =n∑j=2

Pr(Ij = 1) = (n− 1) · 7

12

Page 68: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

E( n∑j=2

Ij

)=

n∑j=2

E(Ij) =n∑j=2

Pr(Ij = 1) = (n− 1) · 7

12

Linearity of Expectation

Let Y1, Y2, . . . , Yk be k random variables. Then

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

Page 69: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

E( n∑j=2

Ij

)=

n∑j=2

E(Ij) =n∑j=2

Pr(Ij = 1) = (n− 1) · 7

12

Page 70: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

E( n∑j=2

Ij

)=

n∑j=2

E(Ij) =n∑j=2

Pr(Ij = 1) = (n− 1) · 7

12

Page 71: Probability Recap

Indicator random variables

An indicator random variable is a r.v. that can only be 0 or 1.

Roll a die n times.

Pr(Ij = 1) = 2136 = 7

12 . (by counting the outcomes)

Often an indicator r.v. I is associated with an event such that

Indicator random variables and linearity of expectation work great together!

EXAMPLE

(usually referred to by the letter I )

Fact: E(I) = Pr(I = 1).

I = 1 if the event happens (and I = 0 otherwise).

How many rolls do we expect to show a valuethat is at least the value of the previous roll?

For j ∈ {2, . . . , n}, let indicator r.v. Ij = 1 if the value of the jth rollis at least the value of the previous roll (and Ij = 0 otherwise)

E( n∑j=2

Ij

)=

n∑j=2

E(Ij) =n∑j=2

Pr(Ij = 1) = (n− 1) · 7

12

Page 72: Probability Recap

Markov’s inequality

EXAMPLE

Suppose that the average (mean) speed on the motorway is 60 mph.

Page 73: Probability Recap

Markov’s inequality

It then follows that at most

EXAMPLE

Suppose that the average (mean) speed on the motorway is 60 mph.

Page 74: Probability Recap

Markov’s inequality

It then follows that at most

EXAMPLE

Suppose that the average (mean) speed on the motorway is 60 mph.

12 of all cars drive at least 120 mph,

Page 75: Probability Recap

Markov’s inequality

It then follows that at most

EXAMPLE

. . . otherwise the mean must be higher than 60 mph. (a contradiction)

Suppose that the average (mean) speed on the motorway is 60 mph.

12 of all cars drive at least 120 mph,

Page 76: Probability Recap

Markov’s inequality

It then follows that at most

EXAMPLE

. . . otherwise the mean must be higher than 60 mph. (a contradiction)

Suppose that the average (mean) speed on the motorway is 60 mph.

23 of all cars drive at least 90 mph,

Page 77: Probability Recap

Markov’s inequality

It then follows that at most

IfX is a non-negative r.v., then for all a > 0,

Pr(X ≥ a) ≤ E(X)

a.

THEOREM (Markov’s inequality)

EXAMPLE

. . . otherwise the mean must be higher than 60 mph. (a contradiction)

Suppose that the average (mean) speed on the motorway is 60 mph.

23 of all cars drive at least 90 mph,

Page 78: Probability Recap

Markov’s inequality

It then follows that at most

IfX is a non-negative r.v., then for all a > 0,

Pr(X ≥ a) ≤ E(X)

a.

From the example above:

� Pr(speed of a random car≥ 120 mph) ≤ 60120 = 1

2 ,

� Pr(speed of a random car≥ 90mph) ≤ 6090 = 2

3 .

EXAMPLE

THEOREM (Markov’s inequality)

EXAMPLE

. . . otherwise the mean must be higher than 60 mph. (a contradiction)

Suppose that the average (mean) speed on the motorway is 60 mph.

23 of all cars drive at least 90 mph,

Page 79: Probability Recap

Markov’s inequality

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

Page 80: Probability Recap

Markov’s inequality

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

Page 81: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

otherwise Ij = 0.

Page 82: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Page 83: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . . Fact: E(I) = Pr(I = 1).

Page 84: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Page 85: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

By Markov’s inequality (recall: Pr(X ≥ a) ≤ E(X)a ),

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Page 86: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

By Markov’s inequality (recall: Pr(X ≥ a) ≤ E(X)a ),

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Pr(5 or more people leaving with their own hats) ≤ 15 ,

Page 87: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

By Markov’s inequality (recall: Pr(X ≥ a) ≤ E(X)a ),

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Pr(5 or more people leaving with their own hats) ≤ 15 ,

Pr(at least 1 person leaving with their own hat) ≤ 11 = 1.

Page 88: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

By Markov’s inequality (recall: Pr(X ≥ a) ≤ E(X)a ),

(sometimes Markov’s inequality is not particularly informative)

EXAMPLE

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Pr(5 or more people leaving with their own hats) ≤ 15 ,

Pr(at least 1 person leaving with their own hat) ≤ 11 = 1.

Page 89: Probability Recap

Markov’s inequality

For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the jth person gets their own hat,

By Markov’s inequality (recall: Pr(X ≥ a) ≤ E(X)a ),

(sometimes Markov’s inequality is not particularly informative)

EXAMPLE

In fact, here it can be shown that as n→∞, the probability that at least

one person leaves with their own hat is 1− 1e .

n people go to a party, leaving their hats at the door.Each person leaves with a random hat.

How many people leave with their own hat?

E( n∑j=1

Ij

)=

n∑j=1

E(Ij) =n∑j=1

Pr(Ij = 1) = n· 1n

= 1.

otherwise Ij = 0.By linearity of expectation. . .

Pr(5 or more people leaving with their own hats) ≤ 15 ,

Pr(at least 1 person leaving with their own hat) ≤ 11 = 1.

Page 90: Probability Recap

Markov’s inequality

IfX is a non-negative r.v. that only takes integer values, then

Pr(X > 0) = Pr(X ≥ 1) ≤ E(X) .

COROLLARY

For an indicator r.v. I , the bound is tight (=), as Pr(I > 0) = E(I).

Page 91: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

THEOREM (union bound)

Page 92: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

THEOREM (union bound)

This is the probability at least one of the events happens

Page 93: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

THEOREM (union bound)

Page 94: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Page 95: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Page 96: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Page 97: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Let the r.v. X =∑kj=1 Ij be the number of events that happen.

Page 98: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Pr(⋃k

j=1 Vj)= Pr(X>0) ≤ E(X) = E(

∑kj=1 Ij) =

∑kj=1 E(Ij)

Let the r.v. X =∑kj=1 Ij be the number of events that happen.

=∑kj=1 Pr(Vj)

Page 99: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

by previous

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Pr(⋃k

j=1 Vj)= Pr(X>0) ≤ E(X) = E(

∑kj=1 Ij) =

∑kj=1 E(Ij)

Let the r.v. X =∑kj=1 Ij be the number of events that happen.

=∑kj=1 Pr(Vj)

Markov corollary

Page 100: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

by previous

Linearity of expectation

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Pr(⋃k

j=1 Vj)= Pr(X>0) ≤ E(X) = E(

∑kj=1 Ij) =

∑kj=1 E(Ij)

Let the r.v. X =∑kj=1 Ij be the number of events that happen.

=∑kj=1 Pr(Vj)

Markov corollary

Page 101: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

by previous

Linearity of expectation

Define indicator r.v. Ij to be 1 if event Vj happens, otherwise Ij = 0.

THEOREM (union bound)

PROOF

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Pr(⋃k

j=1 Vj)= Pr(X>0) ≤ E(X) = E(

∑kj=1 Ij) =

∑kj=1 E(Ij)

Let the r.v. X =∑kj=1 Ij be the number of events that happen.

=∑kj=1 Pr(Vj)

Markov corollary

E(I) = Pr(I = 1)

Page 102: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

Page 103: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

EXAMPLE

Page 104: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

Page 105: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) =13 + 1

2 = 56

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

Page 106: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

1

2

S

V1

V2

4

6

3

5Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) =

13 + 1

2 = 56

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

Page 107: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

1

2

S

V1

V2

4

6

3

5Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) =

13 + 1

2 = 56

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

in fact, Pr(V1 ∪ V2) = 23

Page 108: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

1

2

S

V1

V2

4

6

3

5Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) =

13 + 1

2 = 56

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

in fact, Pr(V1 ∪ V2) = 23 (3 was ‘double counted’)

Page 109: Probability Recap

Union bound

Let V1, . . . , Vk be k events. Then

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

This bound is tight (=) when the events are all disjoint.

THEOREM (union bound)

(Vi and Vj are disjoint iff Vi ∩ Vj is empty)

S = {1, . . . , 6} is the set of outcomes of a die roll.

1

2

S

V1

V2

4

6

3

5Pr(V1 ∪ V2) ≤ Pr(V1) + Pr(V2) =

13 + 1

2 = 56

EXAMPLE

We define two events: V1 = {3, 4}V2 = {1, 2, 3}

in fact, Pr(V1 ∪ V2) = 23 (3 was ‘double counted’)

Typically the union bound is used when each Pr(Vi) is much smaller than k.

Page 110: Probability Recap

Summary

The sample space S is the set of outcomes of an experiment.

For x ∈ S, the probability of x, written Pr(x),

such that∑x∈S Pr(x) = 1.

is a real number between 0 and 1,

An event is a subset V of the sample space S, Pr(V ) =∑x∈V Pr(x)

The probability of Y taking value y is{x ∈ S st. Y(x) = y}

A random variable (r.v.) Y is a function which maps x ∈ S to S(x) ∈ RPr(Y = y) =

∑Pr(x).

The expected value (the mean) of Y is E(Y ) =∑x∈S

Y (x)·Pr(x).

An indicator random variable is a r.v. that can only be 0 or 1.

Fact: E(I) = Pr(I = 1).

Let V1, . . . , Vk be k events then,

THEOREM (union bound)

Pr( k⋃i=1

Vi

)≤

k∑i=1

Pr(Vi).

IfX is a non-negative r.v., then for all a > 0,

THEOREM (Markov’s inequality)

Pr(X ≥ a) ≤ E(X)

a.

Let Y1, Y2, . . . , Yk be k random variables then,

E( k∑i=1

Yi

)=

k∑i=1

E(Yi)

THEOREM (Linearity of expectation)