Random Variables: Expectations and Variances Charlie Gibbons Political Science 236 September 22, 2008
Random Variables: Expectations and Variances
Charlie GibbonsPolitical Science 236
September 22, 2008
Outline
1 ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations
2 DispersionVarianceConditional VarianceCovariance and Correlation
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
For discrete random variables, this can be written as
E(X) =∑x∈X
Pr(X = x)x =∑x∈X
f(x)x.
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
For discrete random variables, this can be written as
E(X) =∑x∈X
Pr(X = x)x =∑x∈X
f(x)x.
For continuous random variables:
E(X) =∫ ∞−∞
xf(x) dx =∫ ∞−∞
x dF
Expectations of Functions of Random Variables
The definition of expectation can be generalized for functions ofrandom variables, g(x).
Expectations of Functions of Random Variables
The definition of expectation can be generalized for functions ofrandom variables, g(x).
We have the equations
E[g(x)] =∑x∈X
f(x)g(x)
=∫ ∞−∞
g(x) dF
for discrete and continuous random variables respectively.
Properties of Expectations
Expectations are linear operators, i.e.,
E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.
Properties of Expectations
Expectations are linear operators, i.e.,
E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.
Note that, in general, E[g(x)] 6= g[E(X)].
Jensen’s inequality states that, for a convex function g(x),E[g(x)] ≥ g[E(X)]. For concave functions, the inequality isreversed.
Properties of Expectations
Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].
As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.
Properties of Expectations
Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].
As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.
Expectations also preserve equality; for g(x) = h(x) ∀x ∈ X,E[g(X)] = E[h(X)].
Conditional Expectations
The expectation of a random variable Y conditional on or givenX is defined analogously to the preceding formulations, butuses the conditional distribution fY |X(y|X = x), rather thanthe unconditional fY (y).Conditional distributions are about changing thepopulation that you are considering.
Conditional Expectations
Conditional distributions are about changing thepopulation that you are considering.
For discrete random variables:
E(Y |X = x) =∑y∈Y
Pr(Y = y|X = x)y =∑y∈Y
fY |X(y)y.
Conditional Expectations
Conditional distributions are about changing thepopulation that you are considering.
For discrete random variables:
E(Y |X = x) =∑y∈Y
Pr(Y = y|X = x)y =∑y∈Y
fY |X(y)y.
For continuous random variables:
E(Y |X = x) =∫ ∞−∞
yfY |X(y|X = x) dy =∫ ∞−∞
y dFY |X(y|X = X)
Conditional Expectations
Recall from the previous set of slides that, for independentrandom variables X and Y ,
fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)
Conditional Expectations
Recall from the previous set of slides that, for independentrandom variables X and Y ,
fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)
Hence,E(Y |X) = E(Y ) and E(X|Y ) = E(X)
This will be a heavily-used result later in the course.
Conditional Expectations
Let’s pause and consider what is random here.
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .
E(Y |X) is random, since X is no longer fixed (i.e., it israndom), though Y is integrated over.
Conditional Expectations
The unconditional expected value of a function of X and Y ,g(x, y) can be written as:
E[g(X,Y )] =∑x∈X
∑y∈Y
g(x, y)fX,Y (x, y)
=∫ ∞−∞
∫ ∞−∞
g(x, y)fX,Y (x, y) dy dx
Conditional Expectations
The following holds for the conditional expectation of Y givenX:
E[g(X)h(Y )|X] = g(X)E[h(Y )|X]
Question: Why does this hold when X is a randomvariable?
Conditional Expectations
The law of iterated expectations states that
EY (Y ) = EX [E(Y |X)],
which can be shown by
EY (Y ) =∫ ∫
yfX,Y (x, y) dx dy
=∫ ∫
yfY |X(y|x)fX(x) dx dy
=∫ [
yfY |X(y|x) dy]fX(x) dx
=∫
E(Y |X)fX(x) dx
= EX [E(Y |X)]
Conditional Expectations
Note that the LIE only holds when the necessary marginalmoments exist.
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =
√Var(X).
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =
√Var(X).
See that Var(aX + b) = a2Var(X).
Conditional Variance
The conditional variance of Y given X is
E[
(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2
Conditional Variance
The conditional variance of Y given X is
E[
(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2
Note that
Var[g(X)h(Y )|X] = [g(X)]2Var[h(Y )|X]
Conditional Variance
Var(Y ) = EY[(Y − EY (Y ))2
]= EY
[(Y − EY |X(Y |X) + EY |X(Y |X)− EY (Y )
)2]= EY
[(Y − EY |X(Y |X)
)2]+ EY [(EY |X(Y |X)− EY (Y ))2]= EX
[E[(Y − EY |X(Y |X)
)2∣∣∣X]]+EX
[E[(
EY |X(Y |X)− EY (Y ))2∣∣∣X]]
= EX [Var(Y |X)] + EX[(
EY |X(Y |X)− EY (Y ))2]
= EX [Var(Y |X)] + Var[EY |X(Y |X)
]
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
Var(Y ) ≥ Var[EY |X(Y |X)
]Conditioning on more X variables yields a smaller variance.
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
Var(Y ) ≥ Var[EY |X(Y |X)
]Conditioning on more X variables yields a smaller variance.
Intuition: Adding more predictors can only make your guess ofY better. If you add something that isn’t relevant or too noisy,you can just ignore it (in a regression, e.g., you give thatpredictor a 0 coefficient).
See Casella and Berger, pp. 167–168 for a similar, detailedexposition.
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
We have
Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
We have
Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )
Note that covariance only measures the linear relationshipbetween two random variables.
Covariance and Correlation
The correlation of random variables X and Y is defined as
ρXY =σXYσXσY
Covariance and Correlation
The correlation of random variables X and Y is defined as
ρXY =σXYσXσY
Correlation is a normalized version of covariance—how big isthe correlation relative to the variation in X and Y ? Both willhave the same sign.
Covariance and Correlation
If X and Y are independent, then
E [g(X)h(Y )] = E [g(X)] E [h(Y )]
Covariance and Correlation
If X and Y are independent, then
E [g(X)h(Y )] = E [g(X)] E [h(Y )]
Hence, for independent random variables, the covariance is 0.
ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations
DispersionVarianceConditional VarianceCovariance and Correlation