Random Variables & E xpectation. Random Variable A random variable (r.v.) is a well defined rule for assigning a numerical value to all possible outcomes.

Post on 14-Jan-2016

226 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Random Variables & Expectation

Random VariableA random variable (r.v.) is a well defined rule for

assigning a numerical value to all possible outcomes of an experiment.

example:

experiment: taking a courseoutcomes: grades A, B, C, D, Fsample space S: discrete & finiterandom variable: Y = 4 if grade is A

Y = 3 if grade is BY = 2 if grade is CY = 1 if grade is DY = 0 if grade is F

Experiment: throw 2 diceWhat are the possible outcomes?

1,1 2,1 3,1 4,1 5,1 6,1

1,2 2,2 3,2 4,2 5,2 6,2

1,3 2,3 3,3 4,3 5,3 6,3

1,4 2,4 3,4 4,4 5,4 6,4

1,5 2,5 3,5 4,5 5,5 6,5

1,6 2,6 3,6 4,6 5,6 6,6

Define the random variable X to be the sum of the dots on the 2 dice.

For which outcomes does X = 9

1,1 2,1 3,1 4,1 5,1 6,1

1,2 2,2 3,2 4,2 5,2 6,2

1,3 2,3 3,3 4,3 5,3 6,3

1,4 2,4 3,4 4,4 5,4 6,4

1,5 2,5 3,5 4,5 5,5 6,5

1,6 2,6 3,6 4,6 5,6 6,6

For which outcomes does X = 9

1,1 2,1 3,1 4,1 5,1 6,1

1,2 2,2 3,2 4,2 5,2 6,2

1,3 2,3 3,3 4,3 5,3 6,3

1,4 2,4 3,4 4,4 5,4 6,4

1,5 2,5 3,5 4,5 5,5 6,5

1,6 2,6 3,6 4,6 5,6 6,6

What is Pr(X=9)?

1,1 2,1 3,1 4,1 5,1 6,1

1,2 2,2 3,2 4,2 5,2 6,2

1,3 2,3 3,3 4,3 5,3 6,3

1,4 2,4 3,4 4,4 5,4 6,4

1,5 2,5 3,5 4,5 5,5 6,5

1,6 2,6 3,6 4,6 5,6 6,6

Since there are 36 equally likely outcomes, each has a probability of 1/36.

So since there are 4 outcomes that yield X=9, Pr(X=9) = 4/36 =1/9

Let’s calculate the probabilities of all the possible values x of the random variable X

x Pr(X=x)1,1 2,1 3,1 4,1 5,1

6,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/361,1 2,1 3,1 4,1 5,1

6,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/3610 3/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/3610 3/3611 2/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s calculate the probabilities of the possible values x of the random variable X

x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/3610 3/3611 2/3612 1/36

1,1 2,1 3,1 4,1 5,16,1

1,2 2,2 3,2 4,2 5,26,2

1,3 2,3 3,3 4,3 5,36,3

1,4 2,4 3,4 4,4 5,46,4

1,5 2,5 3,5 4,5 5,56,5

1,6 2,6 3,6 4,6 5,66,6

Let’s graph the probability distribution of X.

x Pr(X=x)

2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/3610 3/3611 2/3612 1/36

Pr(X=x)

2 3 4 5 6 7 8 9 10 11 12 x

8/36

6/36

4/36

2/36

0

Pr(X=x) = f(x) = p(x)as described in this table or graph is called the

probability distribution or probability mass function (p.m.f.)

x Pr(X=x)

2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/3610 3/3611 2/3612 1/36

Pr(X=x)

2 3 4 5 6 7 8 9 10 11 12 x

8/36

6/36

4/36

2/36

0

Properties of Probability Distributions

1. 0 ≤ Pr(X=x) ≤ 1 for all x

2. 1)( x

xp

Cumulative Mass Function

0

)()Pr()(00

xx

xpxXxF

Cumulative Mass Function (2 dice problem)

x Pr(X=x) Pr(X≤x) 2 1/36 1/36 3 2/36 3/36 4 3/36 6/36 5 4/36 10/36 6 5/36 15/36 7 6/36 21/36 8 5/36 26/36 9 4/36 30/3610 3/36 33/3611 2/36 35/3612 1/36 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13

x

1

30/36

24/36

18/36

12/36

6/36

F(x)

Expectation, Expected Value, or Mean of a Random Variable

x

xxpXE )()(

Notice the similarity of the definitions of the mean of a random variable & the mean of

a frequency distribution for a population

N

fxfxN i

c

ii

c

iii

11

)/1( :distrib. freq. pop.

x

xxpXE )()(

Recall that probability [p(x)] is the relative frequency [f/N] with which something occurs over the long run.

So these definitions are saying the same thing.

Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price.

x = price in one year p(x)

94 0.25

98 0.25

102 0.25

106 0.25

Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price.

x = price in one year p(x)

94 0.25

98 0.25

102 0.25

106 0.25

1.00

Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price.

x = price in one year p(x) xp(x)

94 0.25 23.5

98 0.25 24.5

102 0.25 25.5

106 0.25 26.5

1.00

Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price.

x = price in one year p(x) xp(x)

94 0.25 23.5

98 0.25 24.5

102 0.25 25.5

106 0.25 26.5

1.00 100.0

Notice that you do NOT divide by the number of observations when you’re done adding.

Also, the probabilities do not have to be equal; they just have to add up to one.

Theorem: Suppose that g(X) is a function of a random variable X, & the probability mass function of

X is px(x). Then the expected value of g(X) is

x

xxpxgXgE )()()]([

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x)

-2 0.1

-1 0.2

1 0.3

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y p(y)

-2 0.1

-1 0.2

1 0.3

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y p(y)

-2 0.1 1 0.5

-1 0.2

1 0.3

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y p(y)

-2 0.1 1 0.5

-1 0.2 4 0.5

1 0.3

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y p(y) yp(y)

-2 0.1 1 0.5 0.5

-1 0.2 4 0.5 2.0

1 0.3

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y p(y) yp(y)

-2 0.1 1 0.5 0.5

-1 0.2 4 0.5 2.0

1 0.3 E(Y) = 2.5

2 0.4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y

-2 0.1 4

-1 0.2 1

1 0.3 1

2 0.4 4

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y ypx(x)

-2 0.1 4 0.4

-1 0.2 1 0.2

1 0.3 1 0.3

2 0.4 4 1.6

Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using1. the definition of expected value, & 2. the previous theorem.

x p(x) y ypx(x)

-2 0.1 4 0.4

-1 0.2 1 0.2

1 0.3 1 0.3

2 0.4 4 1.6

E(Y) = 2.5

Definition:Variance of a random variable X

)()(

])[()(

2

22

xpX

XEXV

x

Theorem:The variance of X can also be

calculated as follows:

222 XEXEXV )]([)()(

Standard Deviation of a random variable X

)(2 XV

Example: Suppose sales at a donut shop are distributed as below. Calculate (a) the mean number of donuts sold, (b) the variance (using both the definition of the variance & the theorem), & (c) the standard deviation.

x p(x)

1 0.08

2 0.27

4 0.10

6 0.33

12 0.22

First, the mean….

x p(x) xp(x)

1 0.08 0.08

2 0.27 0.54

4 0.10 0.40

6 0.33 1.98

12 0.22 2.64

x p(x) xp(x)

1 0.08 0.08

2 0.27 0.54

4 0.10 0.40

6 0.33 1.98

12 0.22 2.64

=5.64

First, the mean….

Next, the variance using the definition:

x p(x) xp(x) x-

1 0.08 0.08 -4.64

2 0.27 0.54 -3.64

4 0.10 0.40 -1.64

6 0.33 1.98 0.36

12 0.22 2.64 6.36

=5.64

)()(])[()( 222 xpXXEXVx

x p(x) xp(x) x- (x-

1 0.08 0.08 -4.64 21.53

2 0.27 0.54 -3.64 13.25

4 0.10 0.40 -1.64 2.69

6 0.33 1.98 0.36 0.13

12 0.22 2.64 6.36 40.45

=5.64

Next, the variance using the definition:

)()(])[()( 222 xpXXEXVx

x p(x) xp(x) x- (x- (x-p(x)

1 0.08 0.08 -4.64 21.53 1.72

2 0.27 0.54 -3.64 13.25 3.58

4 0.10 0.40 -1.64 2.69 0.27

6 0.33 1.98 0.36 0.13 0.04

12 0.22 2.64 6.36 40.45 8.90

=5.64

Next, the variance using the definition:

)()(])[()( 222 xpXXEXVx

x p(x) xp(x) x- (x- (x-p(x)

1 0.08 0.08 -4.64 21.53 1.72

2 0.27 0.54 -3.64 13.25 3.58

4 0.10 0.40 -1.64 2.69 0.27

6 0.33 1.98 0.36 0.13 0.04

12 0.22 2.64 6.36 40.45 8.90

=5.64 2 =14.51

Next, the variance using the definition:

)()(])[()( 222 xpXXEXVx

Now, the variance using the theorem:V(X) = E(X2)-[E(X)]2.

x p(x) xp(x) x- (x- (x-p(x) x2

1 0.08 0.08 -4.64 21.53 1.72 1

2 0.27 0.54 -3.64 13.25 3.58 4

4 0.10 0.40 -1.64 2.69 0.27 16

6 0.33 1.98 0.36 0.13 0.04 36

12 0.22 2.64 6.36 40.45 8.90 144

=5.64 2 =14.51

Now, the variance using the theorem:V(X) = E(X2)-[E(X)]2.

x p(x) xp(x) x- (x- (x-p(x) x2 x2p(x)

1 0.08 0.08 -4.64 21.53 1.72 1 0.08

2 0.27 0.54 -3.64 13.25 3.58 4 1.08

4 0.10 0.40 -1.64 2.69 0.27 16 1.60

6 0.33 1.98 0.36 0.13 0.04 36 11.88

12 0.22 2.64 6.36 40.45 8.90 144 31.68

=5.64 2 =14.51

Now, the variance using the theorem:V(X) = E(X2)-[E(X)]2.

x p(x) xp(x) x- (x- (x-p(x) x2 x2p(x)

1 0.08 0.08 -4.64 21.53 1.72 1 0.08

2 0.27 0.54 -3.64 13.25 3.58 4 1.08

4 0.10 0.40 -1.64 2.69 0.27 16 1.60

6 0.33 1.98 0.36 0.13 0.04 36 11.88

12 0.22 2.64 6.36 40.45 8.90 144 31.68

=5.64 2 =14.51 E(X2)=46.32

x p(x) xp(x) x- (x- (x-p(x) x2 x2p(x)

1 0.08 0.08 -4.64 21.53 1.72 1 0.08

2 0.27 0.54 -3.64 13.25 3.58 4 1.08

4 0.10 0.40 -1.64 2.69 0.27 16 1.60

6 0.33 1.98 0.36 0.13 0.04 36 11.88

12 0.22 2.64 6.36 40.45 8.90 144 31.68

=5.64 2 =14.51 E(X2)=46.32

2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51

Now, the variance using the theorem:V(X) = E(X2)-[E(X)]2.

And lastly, the standard deviation,by taking the square root of the variance.

x p(x) xp(x) x- (x- (x-p(x) x2 x2p(x)

1 0.08 0.08 -4.64 21.53 1.72 1 0.08

2 0.27 0.54 -3.64 13.25 3.58 4 1.08

4 0.10 0.40 -1.64 2.69 0.27 16 1.60

6 0.33 1.98 0.36 0.13 0.04 36 11.88

12 0.22 2.64 6.36 40.45 8.90 144 31.68

=5.64 2 =14.51 E(X2)=46.32

2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51 = 3.81

Important Theorem

If X has mean and variance 2, then (X-)/ has mean 0 and variance 1.

Example: (G-)/

Suppose your course grades have a mean of 2.7 and a standard deviation of 1.2.

Suppose you took your grades, subtracted 2.7 from each one, then divided those results by 1.2.

The new set of numbers would have a mean of 0 and a standard deviation of 1.

Expectation RulesLet k, a, & b be constants.

1. E(k) = k The mean of a constant is the constant.

2. V(k) = 0 The variance of a constant is zero.

3. E(a + bX) = a + b E(X)

4. V(a + bX) = b2 V(X)

Example: If X has a mean of 3 and a variance of 2/3, what are the mean and variance of Y=5+2X ?

First find the mean E(Y) = E(5+2X). E(a + bX) = a + b E(X).Let a=5 & b=2. Then just plug into the formula. So,E(Y) = E(5+2X) = 5 + 2 E(X) = 5 + 2(3) = 11.Next find the variance V(Y) = V(5+2X). V(a + bX) = b2 V(X).Again let a=5 and b=2 and just plug into the formula.V(Y) = V(5+2X) = 22 V(X) = 4 V(X) = 4(2/3) = 8/3.Notice that the constant term shifts the mean but has no

effect on the spread of the distribution.

Joint Probability Distribution for 2 Discrete Random Variables X & Y

p(x,y) = Pr(X=x and Y=y)

Properties of Joint Probability Distributions

y and x all for 1yxp0 1. ),(

x y

1 y)p(x, 2.

Example: Consider the following joint distribution of the number of jobs & the number of promotions of college graduates in their 1st 5 years out of college.

Number of Promotions (y)

1 2 3 4

1 0.10 0.15 0.12 0.06

2 0.05 0.07 0.10 0.05

3 0.04 0.02 0.14 0.10Num

ber

of

jobs

(x)

For example, the probability of 3 jobs & 2 promotions is 0.02.

Number of Promotions (y)

1 2 3 4

1 0.10 0.15 0.12 0.06

2 0.05 0.07 0.10 0.05

3 0.04 0.02 0.14 0.10Num

ber

of

jobs

(x)

We can determine the marginal distribution of the 2 random variables X & Y

just as we did before for 2 events.Just add across the row or down the column.

Number of Promotions (y)

1 2 3 4

1 0.10 0.15 0.12 0.06

2 0.05 0.07 0.10 0.05

3 0.04 0.02 0.14 0.10

Num

ber

of

jobs

(x)

For the probability of 1 job…

Number of Promotions (y)pX(x):

marginal prob. of x

1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05

3 0.04 0.02 0.14 0.10

Num

ber

of

jobs

(x)

Similarly for the probabilities of 2 or 3 jobs …

Number of Promotions (y)pX(x):

marginal prob. of x

1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

Num

ber

of

jobs

(x)

For the probability of 1 promotion …

Number of Promotions (y)pX(x):

marginal prob. of x

1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of y

0.19

Num

ber

of

jobs

(x)

and for the probabilities of 2, 3, or 4 promotions …

Number of Promotions (y)pX(x):

marginal prob. of x

1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of y

0.19 0.24 0.36 0.21

Num

ber

of

jobs

(x)

Notice again, that you must get at total one when you total the marginal probabilities for x and for y.

Number of Promotions (y)pX(x):

marginal prob. of x

1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of y

0.19 0.24 0.36 0.21 1.00

Num

ber

of

jobs

(x)

Conditional Probabilities for Random VariablesExample

The probability that X is 2 given that Y is 3:

pX|Y(2|3) = Pr(X=2|Y=3)

= Pr(X=2 & Y=3)/Pr(Y=3).

The probability that Y is 2 given that X is 3:

pY|X(2|3) = Pr(Y=2|X=3)

= Pr(Y=2 & X=3)/Pr(X=3).

Let’s do the calculations using our previous example.

Number of Promotions (y)pX(x):

marginal prob. of x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

pX|Y(2|3) = Pr(X=2|Y=3)

= Pr(X=2 & Y=3)/Pr(Y=3)

0.10/0.36 = 0.278.

pY|X(2|3) = Pr(Y=2|X=3)

= Pr(Y=2 & X=3)/Pr(X=3)

= 0.02/0.30 = 0.067.

Cumulative Joint Mass Function for 2 Discrete Random Variables X & Y

F(X,Y) = Pr(X ≤ x and Y ≤ y)

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3)

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3) …

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3)

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3)

= 0.10 + 0.15 + 0.12 + 0.05 + 0.07 + 0.10

Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions

Number of Promotions (y) pX(x):

marginal prob. of

x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of

y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3)

= 0.10 + 0.15 + 0.12 + 0.05 + 0.07 + 0.10

= 0.59

Independence

Recall that 2 events A & B were independent if Pr(A∩B)=Pr(A) Pr(B)

Similarly 2 random variables are independent if p(x,y) = pX(x) pY(y) for all values of x & y

In our previous example, are the number of jobs & number of promotions independent?

Number of Promotions (y)pX(x):

marginal prob. of x1 2 3 4

1 0.10 0.15 0.12 0.06 0.43

2 0.05 0.07 0.10 0.05 0.27

3 0.04 0.02 0.14 0.10 0.30

pY(y): marginal prob. of y

0.19 0.24 0.36 0.21 1.00

Num

ber

of jo

bs (

x)

We must have p(x,y) = pX(x) pY(y) for all values of x & y.

To start, does p(1,1) equal pX(1) pY(1) ?

p(1,1) = 0.10

pX(1) pY(1) = 0.43 • 0.19

= 0.0817

≠ 0.10

So X & Y are not independent.

If that case had been equal, we wouldn’t be done yet. We’d have to verify that equality held for all the cells.

Theorem: mean of a function of 2 random variables X & Y

x y

yxpyxgYXgE ),(),()],([

Suppose that based on the joint distribution of the length X & width Y of lumber sold by a lumberyard, we would like to determine the

mean length, mean width, & mean area of the lumber.

So we want to calculate

E(X),

E(Y), and

E(XY).

Given the joint distribution below, calculate E(X), E(Y), & E(XY).

Y

2 4 6

X4 0.05 0.05 0.10

8 0.10 0.50 0.20

First, determine the marginal distributions.

Y

2 4 6

X4 0.05 0.05 0.10

8 0.10 0.50 0.20

YpX(x)

2 4 6

X4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

The marginal distribution of X ...

The marginal distribution of Y ...

YpX(x)

2 4 6

X4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30

Check that the marginal distribution probabilities sum to 1.

YpX(x)

2 4 6

X4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

Next we calculate the mean length & mean width.

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

For E(X), remember we need to multiply the values by their probabilities

and add up.

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x p(x) xp(x)

We get the values of X and their probabilities …

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x p(x) xp(x)

4 0.20

8 0.80

multiply …

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x p(x) xp(x)

4 0.20 0.80

8 0.80 6.40

and add up.

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x p(x) xp(x)

4 0.20 0.80

8 0.80 6.40

7.20

We now have our E(X).

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x p(x) xp(x)

4 0.20 0.80

8 0.80 6.40

E(X) = 7.20

For E(Y), we do the same thing.

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

y p(y) yp(y)

Get the values of Y and their probabilities …

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

y p(y) yp(y)

2 0.15

4 0.55

6 0.30

multiply …

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

y p(y) yp(y)

2 0.15 0.30

4 0.55 2.20

6 0.30 1.80

and add up.

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

y p(y) yp(y)

2 0.15 0.30

4 0.55 2.20

6 0.30 1.80

4.30

There’s our E(Y).

YpX(x)

2 4 6

X

4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

y p(y) yp(y)

2 0.15 0.30

4 0.55 2.20

6 0.30 1.80

E(Y) = 4.30

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][For the mean area, E(XY), the theorem translates to

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxp xyXYE ),(][

To keep track of the xy terms, we are going to put them in our table.

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

Next, we need to multiple the xy terms by the corresponding probabilities, …

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

… and then add it all up.

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) + 0.10 (24) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48) ...

To calculate the mean area E(XY), we use the theorem

YpX(x)

2 4 6

X4 0.05 (8) 0.05 (16) 0.10 (24) 0.20

8 0.10 (16) 0.50 (32) 0.20 (48) 0.80

pY(y) 0.15 0.55 0.30 1.00

x y

yxpyxgYXgE ),(),()],([

x y

yxpxyXYE ),( ][

So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48)

= 30.8 for the mean area.

You might wonder if we could get E(XY) by just multiplying E(X) by E(Y).

The answer is generally not.

In our example, we had E(X) = 7.2, E(Y) =4.3, & E(XY) = 30.8

E(X) E(Y) = 30.96, not 30.80.

Close in this case, but not the same.

If X and Y are independent, then it is true that E(XY) = E(X) E(Y).

It may also hold occasionally in other cases.

But generally, it doesn’t work.

Definition: Covariance of X & Y

What does this mean?

x yYX

YX

yxpyx

YXEYXC

),())((

)])([(),(

Suppose that two variables tend to move in the same direction,

like study time and grades.

Next, when x is large, so that it is larger than its mean, then x-X > 0.

When x is large, y tends to be large as well, so that y-Y > 0 also.

Remember, that the p(x,y) values are probabilities and therefore must be positive.

So those terms in the formula would look like

x y

YXyxpyxYXC ),())((),(

+ + +These products are positive.

Similarly, since x and y tend to be small together,we have x-X < 0 with y-Y<0 too.

Those terms would look like

x y

YXyxpyxYXC ),())((),(

- - +These products are positive too.

So we’re adding up a lot of positive numbers.

What all that means is that when 2 variables tend to move in the same direction, the covariance will positive.

When 2 variables tend to move in opposite directions,

their covariance C(X,Y) < 0,

perhaps like party time and grades.

If variables don’t tend to move either in the same or opposite directions,

their covariance C(X,Y) = 0.

This case includes independent variables.

It is usually easier to calculate covariances using this theorem.

Theorem: C(X,Y) = E(XY) – E(X) E(Y)

Returning to the lumber example

Remember we had E(X) = 7.2, E(Y) = 4.3, & E(XY) = 30.8

Then the covariance would be

C(X,Y) = E(XY) – E(X) E(Y)

= (30.8) – (7.2)(4.3)

= - 0.16

Difficulty

The value of the covariance changes when you change units.

That is, you get different answers if you use feet, inches, or meters.

So it’s difficult to tell if a particular answer means a strong relationship or not.

Fortunately, we have a solution to this problem …

Correlation Coefficient

The correlation coefficient is similar to the covariance, but it doesn’t vary with the units used.

Correlation Coefficient

YX

YXCYX

),(),(

The correlation coefficient is denoted by the Greek letter rho, .

It’s computed by dividing the covariance of X & Y by the standard deviations of X & of Y.

The correlation coefficient is always between -1 and 1.

-1 ≤ ≤ 1.

Correlation Coefficient

So, if your correlation coefficient is close to 1, you have a strong positive relationship.

If it is close to -1, you have a strong negative relationship.

If it is close to zero, there is no strong linear relationship at all.

-1 ≤ ≤ 1

Back to the lumber example again

We had C(X,Y) = -0.16.

We need the standard deviations of X and Y, which we have not calculated yet.

YX

YXCYX

),(),(

This is what we had for X so far.

x p(x) xp(x)

4 0.20 0.80

8 0.80 6.40

E(X) = 7.20

Recall we said previously that we can calculate V(X)as V(X) = E(X2) – [E(X)]2.

x p(x) xp(x)

4 0.20 0.80

8 0.80 6.40

E(X) = 7.20

We have E(X) but we need E(X2).

The theorem E[g(X)] = g(x)p(x) gives us

E(X2) = x2p(x)

E(X2) = x2p(x)

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16

8 0.80 6.40 64

E(X) = 7.20

E(X2) = x2p(x)

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20

E(X2) = x2p(x)

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20 E(X2) = 54.4

Now we need to subtract to get V(X).

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20 E(X2) = 54.4

V(X) = E(X2) – [E(X)]2

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20 E(X2) = 54.4

V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20 E(X2) = 54.4

V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56

Take the square root to get the standard deviation X

x p(x) xp(x) x2 x2p(x)

4 0.20 0.80 16 3.2

8 0.80 6.40 64 51.2

E(X) = 7.20 E(X2) = 54.4

V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56

X = 1.60

We do the same thing with Y.

y p(y) yp(y)

2 0.15 0.30

4 0.55 2.20

6 0.30 1.80

E(Y) = 4.30

Get y2

y p(y) yp(y) y2 y2p(y)

2 0.15 0.30 4

4 0.55 2.20 16

6 0.30 1.80 36

E(Y) = 4.30

Multiply by p(y).

y p(y) yp(y) y2 y2p(y)

2 0.15 0.30 4 0.60

4 0.55 2.20 16 8.80

6 0.30 1.80 36 10.80

E(Y) = 4.30

Add to get E(Y2).

y p(y) yp(y) y2 y2p(y)

2 0.15 0.30 4 0.60

4 0.55 2.20 16 8.80

6 0.30 1.80 36 10.80

E(Y) = 4.30 E(Y2) = 20.20

Subtract to get V(Y).

y p(y) yp(y) y2 y2p(y)

2 0.15 0.30 4 0.60

4 0.55 2.20 16 8.80

6 0.30 1.80 36 10.80

E(Y) = 4.30 E(Y2) = 20.20

V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71

Take the square root to get the standard deviation Y

y p(y) yp(y) y2 y2p(y)

2 0.15 0.30 4 0.60

4 0.55 2.20 16 8.80

6 0.30 1.80 36 10.80

E(Y) = 4.30 E(Y2) = 20.20

V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71

Y = 1.31

Now we have everything we need to compute the correlation coefficient for the lumber problem.

076.0)31.1)(60.1(

16.0

),(),(

YX

YXCYX

This number is much closer to 0 than it is to -1.

So the negative relation between the length & width of the lumber is very weak.

Theorem

1. E(aX + bY) = aE(X) + bE(Y)

2. V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]

Example:

The mean & variance of X are 1 & 5 respectively. The mean & variance of Y are 2 & 6 respectively.

The covariance of X & Y is 7. Determine the mean & variance of 4X + 3Y.

Recall: E(aX + bY) = aE(X) + bE(Y) V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]To solve this problem what should “a” & “b” be?a is 4 & b is 3.E(aX + bY) = aE(X) + bE(Y) = 4 (1) + 3(2) = 4 + 6 =10V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)] = 42V(X) + 32V(Y) + 2(4)(3)C(X,Y) = 16(5) + 9(6) +24(7) = 80 + 54 + 168 =302

Consider the following joint distribution of X & Y.

y

2 4

x

1 0.20 0.25

3 0.15 0.20

5 0.15 0.05

Determine the following:

a. The mean & variance of X

b. The mean & variance of Y

c. The covariance & correlation coefficient of X & Y

d. The mean & variance of X+Y

First, determine the marginal distribution of X

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

and the marginal distribution of Y.

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50

Verify that they sum to 1.

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50 1

Set up table to compute the mean & variance of X.

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50 1

x p(x) xp(x) x2p(x)

Fill in the values of X and their probabilities.

x p(x) xp(x) x2p(x)

1 0.45

3 0.35

5 0.20

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50 1

Multiply x by p(x).

x p(x) xp(x) x2p(x)

1 0.45 0.45

3 0.35 1.05

5 0.20 1.00

Add to get the mean of X.

x p(x) xp(x) x2p(x)

1 0.45 0.45

3 0.35 1.05

5 0.20 1.00

E(X) =2.50

To calculate the variance, first compute E(X2) = x2p(x).

x p(x) xp(x) x2p(x)

1 0.45 0.45 0.45

3 0.35 1.05 3.15

5 0.20 1.00 5.00

E(X) =2.50

To calculate the variance, first compute E(X2) = x2p(x).

x p(x) xp(x) x2p(x)

1 0.45 0.45 0.45

3 0.35 1.05 3.15

5 0.20 1.00 5.00

E(X) =2.50 E(X2)=8.60

Calculate the variance as V(X) = E(X2) – [E(X)]2.

x p(x) xp(x) x2p(x)

1 0.45 0.45 0.45

3 0.35 1.05 3.15

5 0.20 1.00 5.00

E(X) =2.50 E(X2)=8.60

V(X) = E(X2) – [E(X)]2 = 8.6 – (2.5)2 = 2.35

Set up table to compute the mean & variance of Y.

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50 1

y p(y) yp(y) y2p(y)

Fill in the values of Y and their probabilities.

ypX(x)

2 4

x

1 0.20 0.25 0.45

3 0.15 0.20 0.35

5 0.15 0.05 0.20

pY(y) 0.50 0.50 1

y p(y) yp(y) y2p(y)

2 0.5

4 0.5

Multiply y by p(y)

y p(y) yp(y) y2p(y)

2 0.5 1

4 0.5 2

and add to get E(Y).

y p(y) yp(y) y2p(y)

2 0.5 1

4 0.5 2

E(Y)= 3

To calculate the variance, first compute E(Y2) = y2p(y).

y p(y) yp(y) y2p(y)

2 0.5 1 2

4 0.5 2 8

E(Y)= 3

To calculate the variance, first compute E(Y2) = y2p(y).

y p(y) yp(y) y2p(y)

2 0.5 1 2

4 0.5 2 8

E(Y)= 3 E(Y2) = 10

Calculate the variance as V(Y) = E(Y2) – [E(Y)]2.

y p(y) yp(y) y2p(y)

2 0.5 1 2

4 0.5 2 8

E(Y)= 3 E(Y2) = 10

V(Y) = E(Y2) – [E(Y)]2 = 10 – (3)2 = 1

To determine the C(X,Y) = E(XY) - E(X) E(Y), we need

x y

yxpxyXYE ),( )(

As before, we’ll put the xy values in the table

next to the probability values

ypX(x)

2 4

x

1 0.20 (2) 0.25 (4) 0.45

3 0.15 (6) 0.20 (12) 0.35

5 0.15 (10) 0.05 (20) 0.20

pY(y) 0.50 0.50 1.00

Then we multiply and add.

ypX(x)

2 4

x

1 0.20 (2) 0.25 (4) 0.45

3 0.15 (6) 0.20 (12) 0.35

5 0.15 (10) 0.05 (20) 0.20

pY(y) 0.50 0.50 1.00

E(XY) = (0.20)(2) + (0.25)(4) + (0.15)(6) + (0.20)(12) + (0.15)(10) + (0.05)(20)

= 0.40 + 1.00 + 0.90 + 2.40 + 1.50 + 1.00

= 7.20

C(X,Y) = E(XY) – E(X) E(Y)

Since E(XY) = 7.2, E(X) = 2.5, & E(Y) = 3.0,

C(X,Y) = 7.2 – (2.5)(3)

= 7.2 – 7.5

= -0.3

Next, the correlation coefficient.

196.0135.2

3.0

),(),(

YX

YXCYX

Since C(X,Y) = -0.3, V(X)=2.35, & V(Y) =1,

The next part of the problem asked for E(X+Y)

We know that E(X) = 2.5 and E(Y) = 3.0.E(aX+bY) = a E(X) + b E(Y)What should “a” & “b” be? 1 & 1So E(X+Y) = 1 E(X) + 1E(Y) = E(X) + E(Y) = 2.5 + 3.0 = 5.5

Lastly: V(X+Y)

We know V(X) = 2.35, V(Y) = 1, & C(X,Y) = -0.3.

V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)]

What are “a” & “b” ?

1 & 1V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)] = 12 V(X) + 12 V(Y) + 2(1)(1)[C(X,Y)]

= V(X) + V(Y) + 2[C(X,Y)]

= 2.35 + 1 + 2 (-0.3)

= 2.75

Specific Discrete Distributions

1. Uniform

2. Binomial

3. Hypergeometric

4. Multinomial

5. Poisson

Uniform Distribution

The uniform distribution assigns all the possible values equal probabilities.

example: a fair die has possible values 1, 2, 3, 4, 5, and 6 each with probability 1/6.

Graph of Uniform DistributionExample: Fair Die

0 1 2 3 4 5 6 value on die

Probability

1/6

Binomial Distribution

Example: What is the probability of getting 3 heads on 5 tosses of an unfair (lopsided) coin whose probability on any toss of getting a head is 1/3.

What is the probability of getting specifically HTHHT ?

(1/3) (2/3) (1/3) (1/3) (2/3)

= (1/3)3 (2/3)2

What is the probability of any other specific outcome with 3 heads on 5 tosses?

The same.

So we just have to figure out how many different ways you can get 3 heads on 5 tosses, and multiply that by the probability of each individual outcome.

That will give us the probability of getting 3 heads on 5 tosses.

How many ways can you get 3 heads on 5 tosses?

It’s the number of combinations of 5 objects taken 3 at a time.

10 )2)(6(

120

!2 !3

!5

)!35(!3

!535

C

So the probability of getting 3 heads on 5 tosses is

1646.0243

40

9

4

27

1)10(

3

2

3

1

23

35

C

In general, the probability of getting x successes on n trials in which the probability of

success on any given trial is is

xnx

xnC

1)(

This is the binomial distribution.

Notes

1. 0! = 1

2. Each trial that can result in either success or failure is called a Bernoulli trial.

Example: If the probability that any person passes this course is 0.95, what is the probability that in a

a class of 30 people, exactly 28 people pass?

259.005.0)(0.95)(1)(228

2830

CCxnx

xn

43529152

2930

!2 !28

!282930

2! !28

!30 where

xnC

Let’s go back to the example in which we flipped a coin 5 times & the probability of heads on each toss was 1/3.

For 3 heads, the probability was 0.1646.

Using the binomial formula, we can determine the probabilities of the other possibilities.

x p(x)0 0.13171 0.32922 0.32923 0.16464 0.04125 0.0041

1

If we graph this distribution, it looks like:

x p(x)0 0.13171 0.32922 0.32923 0.16464 0.04125 0.0041

1

0 1 2 3 4 5 number of heads

probability

0.35

0.30

0.25

0.20

0.15

0.10

0.05

Notice that there is a bump on the left and a tail on the right.

Such a distribution is said to be skewed to the right.

The skew is where the tail is.

Binomial Distribution

The binomial distribution graph we just did was for = 1/3 and the skew was to the right.

A binomial distribution with < ½ will always have a skew to the right.

What do you think the distribution will look like if >½?

It will be skewed to the left. (The tail will be on the left & the bump will be on the right.)

Binomial Distribution

What do you think the distribution will look like if ½?It will be symmetric. The left and right sides will be mirror

images of each other.If the number of trials n (tosses in our example) is large, the

graph will be roughly symmetric even if ≠½How largedoes n have to be for the graph to be roughly

symmetric? That depends on how far is from ½.There are two sets of rules that are sometimes used to

determine if the graph is roughly symmetric.One rule requires that n ≥ 5 and n( ≥ 5.The other rule requires that n ≥ 3.These rules are not exactly equivalent, but they both work

reasonably well.

Mean & Variance of the Binomial Distribution

Mean: = nVariance:2 = n()

Example: What are the mean, variance, & standard deviation for our binomial

distribution example in which n=5 & =1/3?

Mean: = n = (5)(1/3) = 5/3

Variance:2 = n = (5)(1/3)(2/3)= 10/9

1.054910 :Deviation Standard

On an Excel spreadsheet, you can get the binomial distribution as follows:

click insert, and then click functionselect statistical as the category of function,

scroll down to the binomdist function, and click on it

fill in the information in the dialog box .

Using Excel to calculate Binomial Probabilities

Suppose that you wanted to calculate a messy binomial, such as the probability of between 60 and 70 successes inclusive, on 100 trials with

success probability on each trial of 0.64.

This would be a lot of work with just a calculator. You would have to calculate 11 separate binomial probabilities (the probabilities for 60, 61, 62, … 70) and then add them up.

It’s much easier with Excel.

You can calculate the (cumulative) probability of 70 or fewer successes.

Then calculate the cumulative probability of 59 or fewer successes.

Then take the difference.

Remember: you want the probability of between 60 and 70 successes inclusive, on 100 trials with

success probability on each trial of 0.64.

To get the probability of 70 or fewer successes, specify the following:

# of successes: 70# of trials: 100prob.of success on any trial: 0.64cumulative: True (because you want 70

or fewer, not just 70)

To get the probability of 59 or fewer successes, specify the following:

# of successes: 59# of trials: 100prob.of success on any trial: 0.64cumulative: True

Then just subtract the two cumulative function values you calculated.

If you do this, you get 0.91368 – 0.17394 = 0.7397

We can also study binomial problems using proportions.

For example, we might want to know the probability of getting 60% heads on 5 tosses of a coin with probability of heads on each toss of 1/3. (This is the same as getting 3 heads.)In general, if X is the number of successes on n trials, the proportion of successes is X/n.We can easily determine the mean & variance of this binomial proportion variable X/n.If again is the probability of success on any given trial,

E(X/n) =

V(X/n) = /n

When can we use the binomial distribution?

1. We have exactly two possibilities on each trial (success or failure, heads or tails, male or female, yes or no, etc.)

2. The probability of success is the same on each trial.

3. The trials are independent. (What happens on one trial has no effect on what happens on the next trial.)

Sampling with & without Replacement

Suppose we have a bowl with 6 red and 4 green marbles. We select 3 marbles at random without replacement. We want to know the probability of selecting exactly 2 red marbles.

What’s the probability of getting a red marble on the 1st draw?

6/10

What’s the probability of getting a red marble on the 2nd draw?

It depends on what we got on the first draw.

If we got a red one, then the probability is 5/9.

If we got a green one, then the probability is 6/9.

Since the probability varies from trial to trial, we can not use the binomial distribution.

We will discuss very shortly what we use instead.

What if we selected the marbleswith replacement?

Then the probability of a red marble would be the same on each draw, regardless of what you pulled out previously.

Then we could use the binomial distribution.

Suppose we instead of having 6 red marbles and 4 green marbles, we had 6000 red ones

and 4000 green ones.

The probability of red on the 1st draw would be 6,000/10,000 = 0.6 .

If we got red on the 1st draw, the probability of red on the 2nd draw would be 5999/9999 = 0.59996

If we got green on the 1st draw, the probability of red on the 2nd would be 6000/9999 = 0.60006

These three numbers are very close.

So you could use the binomial distribution to get a very good approximation of the probability.

So if we have two options on each trial, when we can use the binomial distribution?

1. If we sample with replacement, or

2. We sample without replacement, but the sample is small relative to the population.

A rule that is often used is that the sample is less than 5% of the population (n < 0.05 N).

If our sample is more than 5% of our population, then we will use the

hypergeometric distribution.

Let’s return to our marble problem.

Suppose we have a bowl with 6 red and 4 green marbles. We select 3 marbles at random without replacement. We want to know the probability of selecting exactly 2 red marbles.

Remember that the number of ways of selecting x objects from n is .

So there are ways of selecting 2 red marbles from 6.

There are ways of selecting 1 green marble from 4.

There are ways of selecting 3 marbles from 10.

xnC

26C

14C

310C

So the probability of getting exactly 2 red marbles on 3 draws will be

)(

)( )(

310

1426

C

CC

# of ways of getting the 2 red marbles out of 6

# of ways of getting the 1 green marble out of 4

# of ways of getting 3 marbles out of 10.

and our probability is

7! 3!

10!

3! 1!

4!

4! !2

!6

)(

)( )(

310

1426

C

CC

120

)4)(15(

120

60

5.0

1234567)123(

)12345678910()123)(1(

)1234(

)1234)(12(

)123456(

The hypergeometric distribution can also be used if you have more

than 2 categories.

If you had 3 categories, for example, you would have 3 combinations in the numerator instead of two.

What do you do if the probabilities are constant from trial to trial but you have more than 2 categories?

You use the multinomial distribution, which is a generalization of the binomial.

Recall that the formula for the binomial is

xnx

xnC

1)(

where is the probability of success and is the probability of failure.

Remember that this is equal to

xnx 1 x)!-(n x!

n!

Suppose we have k outcomes for each trial instead of 2, and their probabilities are 1, 2, 3, … k.

Then on n trials, the probability of x1 outcomes of type 1, x2 outcomes of type 2, x3 outcomes of type 3, and … xk outcomes of type k would be

31 21 2 3

1 2 3 k

n!prob. ...

x ! x !x !...x !kxx x x

k

where x1 + x1 + x1 + …+ x1 = n and 1 + 2 + 3 + …+ k = 1

Example: Suppose that at a fair, children pay money to reach into a container, which holds a large number of toys. 50% are of type 1, 30% are of type 2, & 20% are of type 3.Sally pays for 3 toys, and reaches into the box and grabs 3 at random. What is the probability that she gets one of each type?

31 21 2 3

1 2 3 k

n!prob. ...

x ! x !x !...x !kxx x x

k

111 )20.0()30.0()50.0(1! 1! !1

!3

)20.0)(30.0)(50.0()1)(1)(1(

6

)03.0(6 18.0

Our fifth discrete probability distribution is the Poisson distribution.

The Poisson distribution has outcome possibilities 0,1, 2, 3, …. that describe the number of occurrences per unit of time or per unit of space.

It applies in problems involving requests for service such as at expressway tollbooths, supermarket checkout counters, bank teller windows, airport runways, and repair shops.

Poisson Distribution Formula

!)(

x

exp

x

where x is the number of occurrences and is the mean rate of occurrence.

Remember that e is a constant that is approximately equal to 2.71828.

Example: If a bank serves on average 1 customer per minute, (a) what is the probability that exactly 2 customers will enter the bank in the same particular minute?

The mean rate of occurrence = 1.

!

e 2)Pr(X

-

x

x

!2

)1(e

2-1

2

e

-1

2

368.0 184.0

We want Pr(X ≥ 2) = Pr(X=2) + Pr(X=3) + Pr(X=4) + ….Even though these calculations are going to diminish in size, you’re going to have to do a lot of calculations to get a good approximation.There’s a much easier way to do this problem.Use the complement.The complement (or opposite) of “2 or more customers” is “1 or fewer customers.”So Pr(X ≥ 2) = 1 - Pr(X ≤ 1) .Let’s do the problem that way.

(b) What is the probability that 2 or more customers will enter in the same minute?

(b) What is the probability that 2 or more customers will enter in the same minute?

The mean rate of occurrence is still 1.

2)Pr(X 1)Pr(X1

!1

)1(e

!0

1e1

1-10-1

)]1Pr()0[Pr(1 XX

][1 11 ee

]368.0368.0[1 264.0

!)(

x

exp

x

Mean & Variance of a Poisson Distributed Random Variable

Not surprisingly, the mean is since we’ve been referring to that Poisson parameter as the mean rate of occurrence.

It turns out that the variance is also .

top related