Week 3 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 3 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 3 Video Lecture NotesProbability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 4 VL Week 5 VL

mailto:[email protected]


Numerical methods to summarize data

Introduction

Special sampling distributions & sample mean and variance

Numerical methods to summarize dataIntroductionMeasures of location & spreadNumerical example

Graphical procedures to summarize dataSummarizing data



Introduction

Population vs sample

Population: the large body of data;

Sample: a subset of the population.

Question: For the following four cases would we refer to apopulation or sample:

1. All the actuaries in Australia;2. The temperature on 5, randomly chosen, days;3. All NSW cars;4. The basket of goods of each fifth customer on a given day.

Solution: 1. Population; 2. Sample; 3. Population 4. Sample.

402/420



Introduction

Summarising data: Numerical approaches

Given a set of observations x1, x2, x3, . . . , xn selected from apopulation (usually assumed i.i.d. (independent and identicallydistributed)).

Sorted data in ascending order: x(1), x(2), . . . , x(n), such thatx(1) is the smallest and x(n) is the largest.

Objectives:

- Understand the main features of data and to summarise data(essential first step in analysing data);

- Make inferences about the population(more on this later in the course).

403/420



Measures of location & spread







Measures of locationUsed to estimate the central point of the sample, also calledmeasures of central tendency:

The sample mean is given by:

x =1

n·

n∑k=1

xk

The population mean is given by:

µx =∑all x

pX (x) · x

100α% trimmed mean, the average of the observations afterdiscarding the lowest 100α% and highest 100α%:

x̃α =x(bnαc+1) + . . .+ x(n−bnαc)

n − 2bnαc,

where bnαc is the greatest integer less than or equal to nα.404/420




Measures of spread

The sample variance:

s2 =1

n − 1·

n∑k=1

(xk − x)2 =1

n − 1·

(n∑

k=1

x2k +

n∑k=1

x2 − 2n∑

k=1

xkx

)

=1

n − 1·

(n∑

k=1

x2k − n · x2

).

The population variance:

σ2 = Var(X ) =∑all x

pX (x) · (x − µX )2 =∑all x

pX (x) · x2 − µ2X

Sample standard deviation: s =√s2.

Population standard deviation: σ =√σ2.

405/420




Quantiles

Pα, αth quantile or (α× 100)th percentile:

1

n[number of xk<Pα] ≤ α ≤ 1

n[number of xk≤Pα]

approximated by linear interpolation as the ((n − 1)α + 1)th

observation.

Quartiles: Q1 (25th percentile) and Q3 (75th percentile).

Quantile function: F−1X (u), u ∈ [0, 1], where FX (x) = u.

Question: What are the 0.025, 0.16, 0.5, 0.84 and 0.975quantiles of the N(0,1) distribution?

Solution: They are -1.96, -1, 0, 1 and 1.96, respectively.406/420




Mode: The mode m is the value that maximises the p.m.f.pX (x) in the discrete case or the p.d.f. fX (x) in thecontinuous case.

Median, M:

M =

x( n+12 ), if n is odd;

12 ·(x( n

2 ) + x( n2

+1)

), if n is even.

Median absolute deviation:

MAD = median of the numbers:{|xi −M|}.

Range:

R = x(n) − x(1).

Interquartile range:

IQR = Q3 − Q1.407/420



Numerical example






Numerical example

Numerical exampleAn insurance company has occurred the 26 claims with thefollowing amounts:

1 120 1 000 760 348 1 548 3 400 588990 975 346 1 100 752 335 1 245450 1 000 2 430 1 245 850 605 540478 584 1 406 760 1 000

with

26∑i=1

xi = 25 855;

26∑i=1

x2i = 36 904 873.

408/420



Numerical example

Numerical example

First step: arrange in ascending order:

335 588 990 1 245346 605 1 000 1 406348 752 1 000 1 548450 760 1 000 2 430478 760 1 100 3 400540 850 1 120584 975 1 245

409/420



Numerical example

Numerical exampleSome statistics:

Mean:

x =1

n·

n∑i=1

xi =25 855

26= 994.42.

Variance:

s2 =1

n − 1·

(n∑

i=1

x2i − n · x2

)

=1

25·(36 904 873− 26 · (994.42)2

)= 447, 762.6.

Standard deviation:

s =√

447 762.6 = 669.2.410/420



Numerical example

Numerical example

Determine P0.35, i.e., the 35th percentile (0.35 quantile)

This is the (25 · (0.35)) + 1 = 9.75th observation.

Then, linear interpolation gives:

P0.35 = x(9) + 0.75 ·(x(10) − x(9)

)= 605 + 0.75 · (147) = 715.25

= 0.25 · x(9) + 0.75 · x(10)

411/420



Numerical example

Numerical example

Recall: x(1) = 335 and x(26) = 3, 400.

Range:

R = 3, 400− 335 = 3, 065.

Quartiles:

Q1 = x(25·0.25+1) Q2 = M = x(25·0.5+1) Q3 = x(25·0.75+1)

= x(6.25) = x(13.5) = x(19.75)

= 0.75x(6) + 0.25x(7) =(x(13) + x(14)

)/2 = 0.25x(19) + 0.75x(20)

= 585 = 912.5 = 1, 115

Interquartile range:

IQR = Q3 − Q1 = 1, 115− 585 = 530.

412/420



Numerical example

Numerical example

Determine the 10% trimmed mean:

Step 1: Compute bnαc = b26 · (0.1)c = 2.

Hence, we should discard the 2 smallest and the 2 largest ofthe observations.

Step 2: Compute the trimmed mean:

x̃0.10 =x(3) + . . .+ x(24)

22= 879.27.

413/420


Graphical procedures to summarize data

Summarizing data






Summarizing data

Empirical cumulative distribution function (ecdf)

Given a set of observations x1, . . . , xn the empirical cumulativedistribution function is given by:

Fn (x) =1

n· (number of observations IXk≤x)

E[Fn(x)] =FX (x)

Var (Fn(x)) =1

n· FX (x) · (1− FX (x)) .

Note: pn(x) = IXk=x/n, the proportion of observations equalto x .

Proves for E[Fn(x)] and Var (Fn(x)) are not part of thecourse.

414/420



Summarizing data

0 500 1000 1500 2000 2500 3000 35000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Claim amount

E.c.d.f.

F2

6(x

)

Data:

335 346 348450 478 540584 588 605752 760 760850 975 990

1000 1000 10001100 1120 12451245 1406 15482430 3400

415/420



Summarizing data

0 500 1000 1500 2000 2500 3000 35000

2

4

6

8

10

12

14

Claim amount

Histogram

Freq

uenc

y

Data:

335 346 348450 478 540584 588 605752 760 760850 975 990

1000 1000 10001100 1120 12451245 1406 15482430 3400

Quant the number of observations in each bin (0, 500], (500, 1000],(1000, 1500], (1500, 2000], (2000, 2500], (2500, 3000], (3000, 3500].

Bin sizes chosen such that it provides good summary of the data, i.e., not

too short and not too long.416/420



Summarizing data

Stem-and-leaf display

Stem-and-leaf:

0 | 3330 | 55566688891 | 00000112241 | 52 | 42 |3 | 4

Data:

335 346 348450 478 540584 588 605752 760 760850 975 990

1000 1000 10001100 1120 12451245 1406 15482430 3400

Each row corresponds to a bin.The number before | displays the number of thousands (or hundreds/tensetc.).Each number after | displays the 3rd (or 2nd/1st) digit of an observation.

Note: rounding!417/420



Summarizing data

Boxplot (Box-and-Whiskers plot)

0

500

1000

1500

2000

2500

3000

3500Boxplot

Cla

im s

ize

Data:

335 346 348450 478 540584 588 605752 760 760850 975 990

1000 1000 10001100 1120 12451245 1406 15482430 3400

Red line: median; Blue box: Q1 and Q3 (height of box: IQR)Black lines: 10th and 90th percentileRed circles: outliers.418/420



Summarizing data

Q-Q plot calculationsThis is done by plotting the quantile function of your chosendistribution against the order statistics, x(i).

A small continuity adjustment is made, too.

For the example above, a standard normal Q-Q plot, we have:i 1 2 · · · 25 26

i−0.526 0.019 2 0.057 7 · · · 0.942 3 0.980 8

Φ−1(i−0.5

26

)-2.069 9 -1.574 4 · · · 1.574 4 2.069 9

x(i) 335 346 · · · 2 430 3 400

419/420



Summarizing data

Q-Q plot (quantile-quantile plot)

0 500 1000 1500 2000 2500 3000 3500−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Claim size

Stan

dard

nor

mal

qua

ntile

s

Q−Q plot

Data:

335 346 348450 478 540584 588 605752 760 760850 975 990

1000 1000 10001100 1120 12451245 1406 15482430 3400

Q-Q plot displays if a distribution is a correct approximation and/or when

not (tails). Calculations: see previous slide.420/420

ACTL2002/ACTL5101 Probability and Statistics: Week 3

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 3Probability: Week 1 Week 2 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

mailto:[email protected]


Last two weeks

Introduction to probability;

Definition of probability measure, events;

Calculating with probabilities; Multiplication rule,permutation, combination & multinomial;

Distribution function;

Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;

Generating functions;

Special (parametric) univariate distributions.

501/562


This week

Joint probabilities:

- Discrete and continuous random variables;

- Bivariate and multivariate random variables;

Covariance;

Correlation;

Law of iterative expectations;

Conditional variance identity.

502/562


The Bivariate Case

Introduction

Joint & Multivariate Distributions

The Bivariate CaseIntroductionExercisesMeans, Variances, CovariancesCorrelation coefficientConditional DistributionsThe Bivariate Normal Distribution

LawsLaw of Iterated ExpectationsConditional variance identityApplication & Exercise

The Multivariate CaseIntroduction

Summarizing dataExercises

SummarySummary


The Bivariate Case

Introduction

The Bivariate Case

We are often interested in the joint behavior of two (or more)random variables.

Denote a bivariate random vector by a pair as follows:X = [X1,X2]>. The joint distribution function of X is:

FX1,X2(x1, x2) = Pr (X1 ≤ x1,X2 ≤ x2) .

We can write:

Pr (a1 ≤ X1 ≤ b1, a2 ≤ X2 ≤ b2) =FX1,X2 (b1, b2)

− FX1,X2 (b1, a2)

− FX1,X2 (a1, b2)

+ FX1,X2 (a1, a2) .

503/562


The Bivariate Case

Introduction

Discrete Random Variables

In the case where X1 and X2 are both discrete randomvariables which can take values

x11, x12, . . . and x21, x22, . . .

respectively, we define:

pX1,X2 (x1i , x2j) = Pr (X1 = x1i ,X2 = x2j) , for i , j = 1, 2, . . .

as the joint probability mass function of X , then:

∞∑i=1

∞∑j=1

pX1,X2 (x1i , x2j) = 1.

504/562


The Bivariate Case

Introduction

Discrete Random Variables

The marginal p.m.f. of X1 and X2 are respectively

pX1 (x1i ) =∞∑j=1

pX1,X2 (x1i , x2j)

and

pX2 (x2j) =∞∑i=1

pX1,X2 (x1i , x2j) .

(sum over the other random variable(s)).

Prove: use Law of Total Probability.

505/562


The Bivariate Case

Introduction

Example discrete random variablesAn insurer offers both disability insurance (DI) andunemployment insurance (UI) to small companies.

Most companies buy DI and UI, because of a large discount.

The claims are categorized in “no claims”, “mild claims”, and“severe claims”.

Last year the 100 insured felt in the following categories:

DI no no no mild mild mild severe severe severeUI no mild severe no mild severe no mild severe

# 74 6 2 3 2 4 1 3 5

Question: Find the marginal p.m.f. of DI and UI.

Solution: x no mild severe

pDI (x) 74+6+2100 = 0.82 3+2+4

100 = 0.09 1+3+5100 = 0.09

pUI (x) 74+3+1100 = 0.78 6+2+3

100 = 0.11 2+4+5100 = 0.11

506/562


The Bivariate Case

Introduction

Continuous Random Variables

In the case where X1 and X2 are both continuous randomvariables, we set the joint density function of X as

fX1,X2 (X1,X2) =∂

∂x1

∂

∂x2FX1,X2 (x1, x2)

and therefore the joint cumulative density function is given by:

FX1,X2 (x1, x2) =

∫ x2

−∞

∫ x1

−∞fX1,X2 (z1, z2) dz1dz2.

Note:

FX1,X2 (∞,∞) =

∫ ∞−∞

∫ ∞−∞

fX1,X2 (z1, z2) dz1dz2 = 1

FX1,X2 (−∞,−∞) =

∫ −∞−∞

∫ −∞−∞

fX1,X2 (z1, z2) dz1dz2 = 0.

507/562


The Bivariate Case

Introduction

Continuous Random Variables

The marginal density function of X1 and X2 are respectively:

fX1 (x1) =

∫ ∞−∞

fX1,X2 (x1, z2) dz2 and fX2 (x2) =

∫ ∞−∞

fX1,X2 (z1, x2) dz1.

The marginal cumulative distribution function of X1 and X2

are then respectively:

FX1 (x1) =

∫ x1

−∞fX1 (u) du and FX2 (x2) =

∫ x2

−∞fX2 (u) du,

or, alternatively:

FX1 (x1) =

∫ ∞−∞

∫ x1

−∞fX (u1, u2) du1du2

and FX2 (x2) =

∫ x2

−∞

∫ ∞−∞

fX (u1, u2) du1du2.

508/562


The Bivariate Case

Introduction

Continuous Random Variables: example

The joint p.d.f. of X and Y is given by:

fX ,Y = 4 · x · (1− y), for 0 ≤ x , y ≤ 1, and 0 otherwise.

a. The marginal p.d.f. of X is: fX (x) =∫∞−∞ fX ,Y (x , y)dy =∫ 1

0 4 · x · (1− y)dy =[4 · x · (y − 1/2 · y2)

]10

= 2x .

b. The marginal c.d.f. of X is:FX (x) =

∫ x−∞ fX (z)dz =

∫ x0 2zdz =

[z2]x

0= x2, if 0 ≤ x ≤ 1

and zero if x < 0 and one if x > 1.

c. The marginal p.d.f. of Y is: fY (y) =∫∞−∞ fX ,Y (x , y)dx =∫ 1

0 4 · x · (1− y)dx =[1/2 · 4 · x2(1− y)

]10

= 2(1− y).

d. The marginal c.d.f. of Y is: FY (y) =∫ y−∞ fY (z)dz =∫ y

0 2(1− z)dz =[2z − z2

]y0

= 2y − y2, if 0 ≤ y ≤ 1 and zeroif y < 0 and one if y > 1.

509/562


The Bivariate Case

Exercises






SummarySummary


The Bivariate Case

Exercises

Exercise: Discrete caseLet X be the random variable taking one if there is a positivereturn on the asset portfolio and zero otherwise.

Let Y be the random variable for the claims for homeinsurance, which can take value 0, 1, 2, and 3 for few, normal,many claims and a large number of claims due to floods,respectively.

The marginal probability mass functions of X and Y are:

X = x Pr (X = x)

0 1/21 1/2

and

Y = y Pr (Y = y)

0 1/81 3/82 3/83 1/8

Question: What would be the joint probability densityfunction if X and Y are independent?510/562


The Bivariate Case

Exercises

Exercise: Discrete case

Solution: If the two are independent, we would have:

Pr (X = x ,Y = y) = Pr (X = x) · Pr (Y = y)

For all X = x and Y = y the joint distribution, if they areindependent, is described in the table below:

Pr(X = x ,Y = y) Y = yX = x 0 1 2 3

0 1/16 3/16 3/16 1/161 1/16 3/16 3/16 1/16

511/562


The Bivariate Case

Exercises

Exercise: Discrete case

Suppose instead they are not independent and their jointdistribution could be described as:

Pr(X = x ,Y = y) Y = yX = x 0 1 2 3

0 0 3/16 3/16 1/81 1/8 3/16 3/16 0

Question: Proof that X and Y are dependent.

Solution: We have Pr(Y = 3) = 1/8 and Pr(X = 1) = 1/2,however Y takes the value 3 the probability that X takes thevalue 1 is zero (joint probability of Y = 3 and X = 1 is zero).

512/562


The Bivariate Case

Exercises

Example: Multinomial distribution

Suppose we have n independent trials with r outcomes withprobabilities p1, p2, . . . , pr .

The joint frequency distribution is given by:

pN1,N2,...,Nr (n1, n2, . . . , nr ) =n!

n1! · n2! · . . . · nr !pn1

1 · pn22 · . . . · p

nrr .

The marginal distribution is (Binomial distribution!) given by:

pNi(ni ) =

∑N1

, . . . ,∑Ni−1

,∑Ni+1

, . . . ,∑Nr

pN1,N2,...,Nr (n1, n2, . . . , nr )

∗=

(n

ni

)· pnii · (1− pi )

n−ni .

Can do this by summing the marginals.* Using Binomial expansion (prove not required).

513/562


The Bivariate Case

Exercises

Exercise: Continuous case

Now consider an example of a bivariate random vector[X ,Y ]> whose joint density function is:

fX ,Y (x , y) = c(x2 + xy

), for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1,

and zero otherwise. To find the constant c, it must be a validdensity so that:

1 =

∫ ∞−∞

∫ ∞−∞

fX ,Y (x , y) dxdy =

∫ 1

0

∫ 1

0c(x2 + xy

)dxdy

= c ·∫ 1

0

[1

3x3 +

1

2x2y

]1

0

dy =c ·[

1

3y +

1

4y2

]1

0

= c · 7

12.

Hence, c = 12/7, then also fX ,Y (x , y) ≥ 0 for all x , y .

a. Question: Find the marginal densities.

b. Question: Find the joint distribution function.514/562


The Bivariate Case

Exercises


a. Solution: Knowing the constant, we can then determine themarginal densities. First the marginal density for X :

fX (x) =

∫ ∞−∞

fX ,Y (x , y)dy =

∫ 1

0

12

7

(x2 + xy

)dy

=12

7

(x2 +

1

2x

), for 0 ≤ x ≤ 1,

and zero otherwise, and for Y :

fY (y) =

∫ ∞−∞

fX ,Y (x , y)dx =

∫ 1

0

12

7

(x2 + xy

)dx

=12

7

(1

3+

1

2y

), for 0 ≤ y ≤ 1,

and zero otherwise.515/562


The Bivariate Case

Exercises

Exercise: Continuous caseb. Solution: You can also determine the joint distribution

function if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 by:

FX ,Y (x , y) =

∫ y

−∞

∫ x

−∞fX ,Y (u, v)dudv =

∫ y

0

∫ x

0

12

7

(u2 + uv

)dudv

=

∫ y

0

[12

7

(1

3u3 +

1

2u2v

)]x0

dv =

∫ y

0

12

7

(1

3x3 +

1

2x2v

)dv

=

[12

7

(1

3x3v +

1

4x2v2

)]y0

=12

7

(1

3x3y +

1

4x2y2

).

Hence:

FX ,Y (x , y) =

0, if x < 0 or y < 0;127

(13x

3y + 14x

2y2), if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1;

FX (x) , if y > 1;FY (y) , if x > 1.

516/562


The Bivariate Case

Exercises


−0.500.51 1.5

−0.500.511.5−0.5

00.5

11.5

x

joint p.d.f.

y

F X,Y(x

,y)

−0.5 0 0.5 1 1.5−0.5

0

0.5

1

1.5

x

F X(x)

marginal p.d.f.

−0.5 0 0.5 1 1.5−0.5

0

0.5

1

1.5

y

F Y(y)

marginal p.d.f.

−0.5 0 0.5 1 1.5−0.5

0

0.5

1

1.5

x

y

slide 519

517/562


The Bivariate Case

Exercises


You can then determine the marginal distributions:

FX (x) = FX ,Y (x , 1) =

0, if x < 0;127

(13x

3 + 14x

2), if 0 ≤ x ≤ 1;

1, if x > 1,

and

FY (y) = FX ,Y (1, y) =

0, if y < 0;127

(13y + 1

4y2), if 0 ≤ y ≤ 1;

1, if y > 1.

Can you confirm the marginal densities are correct?

518/562


The Bivariate Case

Exercises


It becomes straightforward to compute probability statements suchas (using lower right panel on slide 517):

Pr (X < Y ) =

∫ 1

0

∫ y

0

12

7

(x2 + xy

)dxdy

=12

7·∫ 1

0

[(x3

3+

x2y

2

)]y0

dy

=12

7·∫ 1

0

(y3

3+

y3

2

)dy

=

∫ 1

0

12

7

(5

6y3

)dy =

12 · 57 · 6

[y4

4

]1

0

=5

14,

so that Pr (X > Y ) =∫∞−∞

∫∞y fX ,Y (x , y)dxdy = 9/14.

519/562


The Bivariate Case

Means, Variances, Covariances






SummarySummary


The Bivariate Case


MeansConsider the bivariate random vector X = [X1 X2]>.

The mean of X is the vector whose elements are the means ofX1 and X2, that is,

E[X ] =

[E [X1]E [X2]

]=

[µ1

µ2

].

If X1,X2, . . . ,Xn are jointly distributed random variables withexpectations E [Xi ] for i = 1, . . . , n and Y is a affine functionof the Xi , i.e.,

Y = a +n∑

i=1

biXi ,

then, we have the additively rule:

E [Y ] =E

[a +

n∑i=1

biXi

]= a +

n∑i=1

E [biXi ] =a +n∑

i=1

biE [Xi ] .520/562


The Bivariate Case


Variances, Covariances

Recall: variance of X is a measure for the spread of X .

Covariance is a measure of the spread between X1 and X2.

The variance of the random vector X is also called thevariance-covariance matrix:

Var (X ) =

[Var (X1) Cov (X1,X2)

Cov (X1,X2) Var (X2)

]=

[σ2

1 σ12

σ12 σ22

],

where the covariance is defined as:

Cov (X1,X2) ≡ σ12 =E [(X1 − µ1) · (X2 − µ2)]

=E [X1 · X2 − X1 · µ2−µ1 · X2 + µ1 · µ2]

=E [X1 · X2]− E [X1] · E [X2] .

Note: Cov(Xi ,Xi ) = σii = σ2i , and covariance only defined

for two r.v..521/562


The Bivariate Case


Example: Consider the example from slide 506.

No

Mild

Severe

No

Mild

Severe

0

0.2

0.4

0.6

0.8

DIUI

pro

ba

bili

ty

Question: Is covariancepositive or negative?

Let “no”=0, “mild”=1, and“severe”=2.

Question: Calculate the mean ofX1 = DI and X2 = UI .

Solution:E [X1] = 3+2+4

100 · 1 + 1+3+5100 · 2 = 0.27.

E [X2] = 6+2+3100 · 1 + 2+4+5

100 · 2 = 0.33.

Question: Calculate the covariancebetween X1 and X2.

Solution:E [X1 · X2] = 0.02 · 1 · 1 + 0.04 · 1 · 2 +0.03 · 2 · 1 + 0.05 · 2 · 2 = 0.36.Cov(X1,X2) = E [X1 · X2]− E [X1] ·E [X2] = 0.36− 0.27 · 0.33 = 0.2709.

522/562


The Bivariate Case


Example: Consider the example from slide 509.

0

0.5

1

0

0.5

10

1

2

3

4

xy

f X,Y

(x,y

)

Question: Is covariancepositive or negative?

Question: Calculate the means.

Solution: E [X1] =∫∞−∞ x · fX (x)dx =∫ 1

0 2x2dx = [2/3 · x3]10 = 2/3.

E [X2] =∫∞−∞ y · fY (y)dx =

∫ 10 y · 2

·(1− y)dy = [y2 − 2/3y3]10 = 1/3.

Question: Calculate the covariancebetween X1 and X2.

Solution:E [X1 · X2] =

∫∞−∞

∫∞−∞ fX ,Y (x , y) · x ·

ydxdy =∫ 1

0

∫ 10 4 · x2 · (y − y2)dxdy =∫ 1

0 4/3(y − y2)dy = 4/6− 4/9 = 4/18.Cov(X1,X2) = E [X1 · X2]− E [X1] ·E [X2] = 4/18− 2/3 · 1/3 = 0.

523/562


The Bivariate Case


Let X ∼ Beta(0.2, 1) (prob of a claim) and Y |X ∼ NB(3,X )(Y ∼ Beta-Negative-Binomial). Home insurance, insured qualifiedas bad risk if 3 claims within 50 quarters.Question: Does it have a negative or positive covariance?524/562


The Bivariate Case


Properties of Covariance

If X and Y are jointly distributed random variables withexpectations µX and µY the covariance of X and Y is

Cov (X ,Y ) =E [(X − µX ) · (Y − µY )]

=E [X · Y−X · µY − Y · µX + µX · µY ]

=E [X · Y ]− µX · µY .

If X and Y are independent:

Cov (X ,Y ) = E [X · Y ]− µX · µY∗= E [X ] · E [Y ]− µX · µY = 0.

* using independence X , Y .

525/562


The Bivariate Case



Let X ,Y ,Z be random variables, and a, b ∈ < we have:

Cov (a + X ,Y ) =E [(a + X − (a + µX )) · (Y − µY )]

=E [(X − µX ) · (Y − µY )]

=Cov (X ,Y )

Cov (a · X , b · Y ) =E [(a · X − a · µX ) · (b · Y − b · µY )]

=E [a · (X − µX ) · b · (Y − µY )]

=a · b · E [(X − µX ) · (Y − µY )] = a · b · Cov (X ,Y )

Cov (X ,Y + Z ) =E [(X − µX ) · (Y + Z − µY − µZ )]

=E [(X − µX ) · ((Y − µY ) + (Z − µZ ))]

=E [(X − µX ) · (Y − µY ) + (X − µX ) · (Z − µZ )]

=Cov (X ,Y ) + Cov (X ,Z ) .

526/562


The Bivariate Case



Suppose X1,X2,Y1 and Y2 are r.v., and a, b, c , d ∈ <, then:

Cov (aX1 + bX2, cY1 + dY2)∗=Cov (aX1 + bX2, cY1)

+ Cov (aX1 + bX2, dY2)∗=Cov (aX1, cY1) + Cov (aX1, dY2)

+ Cov (bX2, cY1) + Cov (bX2, dY2)∗∗=acCov (X1,Y1) + adCov (X1,Y2)

+ bcCov (X2,Y1) + bdCov (X2,Y2) .

* using: Cov(X ,Y + Z ) = Cov(X ,Y ) + Cov(X ,Z ).

** using: Cov(aX , bY ) = abCov(X ,Y ).

527/562


The Bivariate Case



Let Xi , Yi be r.v., a, bi , c , dj ∈ < for i = 1, . . . , n andj = 1, . . . ,m.

We can generalize this as follows:

Suppose:

U = a +n∑

i=1

bi · Xi and V = c +m∑j=1

dj · Yj .

Then:

Cov (U,V ) =n∑

i=1

m∑j=1

bi · dj · Cov (Xi ,Yj) .

528/562


The Bivariate Case



Note that Cov (X ,X ) = Var (X ), so we have the variance ofthe sum of r.v. is:

Var (X1 + X2) =Cov(X1 + X2,X1 + X2)

=Cov(X1,X1) + Cov(X2,X2) + 2Cov(X1,X2)

=Var (X1) + Var (X2) +2Cov (X1,X2).

Also,

Var (aX1) = Cov (aX1, aX1) = a2Cov (X1,X1) = a2Var (X1) ,

using the result that we can take a constants out of acovariance.

529/562


The Bivariate Case


Example CovarianceConsider the example from slides 506 and 522.

The costs for disability insurance are $1 million if mild and $2million if severe.

The costs for unemployment insurance are $0.5 million if mildand $1 million if severe.

The price of the contract is the expected value plus half thestandard deviation.

Question: What is the price for DI, UI, and DI and UIcombined?

Solution: E[X 2

1

]= 3+2+4

100 · 12 + 1+3+5100 · 22 = 0.45 and

E[X 2

2

]= 6+2+3

100 · 12 + 2+4+5100 · 22 = 0.55.

Var(X1) = E[X 2

1

]− (E [X1])2 = 0.45− 0.272 = 0.3771 and

Var(X2) = E[X 2

2

]− (E [X2])2 = 0.55− 0.332 = 0.4411.

530/562


The Bivariate Case


Solution (cont.)

Price DI (=1 million ×X1):

Price DI =E [X1 ×million] +√Var(X1 ×million)/2

=E [X1]×million +

√Var(X1)×million2/2

=0.27million +√

0.3771×million2 = 0.5770million.

Price UI (=0.5 million ×X2):

Price UI =E [X2 × 0.5 million] +√

Var(X2 × 0.5 million)/2

=E [X2]× 0.5 million +

√Var(X2)× 0.25 million2/2

=0.165million +√

0.4411× 0.25 million2/2

=0.3310million.531/562


The Bivariate Case


Solution (cont.)

Price DI and UI combined (=1 million ×X1 + 0.5 million×X2):

Price UI and DI =E [X1 ×million + X2 × 0.5 million]

+√

Var(X1 ×million + X2 × 0.5 million)/2

=E [X1]×million + E [X2]× 0.5 million

+

√Var(X1)×million2 + Var(X2)× 0.25 million2

+2Cov(X1,X2)0.5 million2/2

=0.27million + 0.165million

+

√(0.3771 + 0.441/4 + 0.2709)×million2/2

=0.8704million.

This gives a 4.15% discount!532/562


The Bivariate Case

Correlation coefficient






SummarySummary


The Bivariate Case



Large covariance: high dependency or large variance?

We define the correlation coefficient between X1 and X2:

ρ (X1,X2) ≡ Cov (X1,X2)√Var (X1) · Var (X2)

,

provided Cov (X1,X2) exists and the variances Var (X1) andVar (X2) are each non-zero.

The value of the correlation coefficient is always between −1and 1, i.e.

−1 ≤ ρ (X1,X2) ≤ 1.

Note: correlation coefficient is only defined for 2 r.v..

Prove: see next slides.533/562


The Bivariate Case


Prove: Let Y = X1σ1− X2

σ2, Var(Y ) ≥ 0 we have:

0 ≤Var(X1

σ1− X2

σ2

)=Var

(X1

σ1

)+ Var

(X2

σ2

)− 2Cov

(X1

σ1,X2

σ2

)=

1

σ21

Var (X1)︸︷︷︸=1

+1

σ22

Var (X2)︸︷︷︸=1

− 21

σ1

1

σ2Cov (X1,X2)︸︷︷︸

=ρ

√Var(X1)·Var(X2)

σ1·σ2=ρ

=2 (1− ρ) .

Consequently, we see that ρ ≤ 1 because the variance of arandom variable is non-negative.

Proof continues next slide.534/562


The Bivariate Case


Similarly by considering Y = X1σ1

+ X2σ2

, Var(Y ) ≥ 0 we have:

0 ≤ Var

(X1

σ1+

X2

σ2

)= 2 (1 + ρ) ,

we see that ρ ≥ −1, which proves the result. ♦

The correlation coefficient gives a measure of the linearrelationship between the two variables. In fact, ρ = ±1 gives:

Pr (X2 = aX1 + b) = 1

for some constants a 6= 0 and b so that you can write anaffine relationship between the two.

Question: Does a correlation of zero implies independence?Solution:

535/562


The Bivariate Case


Note that we have that if X , Y are independent, thenCov(X ,Y ) = 0, hence:

ρ(X ,Y ) =Cov(X ,Y )√

Var(X ) · Var(Y )=

0√Var(X ) · Var(Y )

= 0.

However, the reverse does not need to hold.

Let X , Y be r.v. with j.p.m.f. (we have set X = Y 2):

Pr(X = x ,Y = y) Y = yX = x −1 0 1

0 0 1/3 01 1/3 0 1/3

We have E [Y ] = 0, E [X ] = 2/3, and E [XY ] = 0. We have:

ρ(X ,Y ) =Cov (X ,Y )√

Var (X )Var (Y )=

E [XY ]− E [X ] · E [Y ]√Var (X )Var (Y )

= 0.

536/562


The Bivariate Case



−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−2

−1

0

1

2

3

4

5

6

X

Y

quadratic dependence →

linear dependence

ρ=0 ρ=0.9 ρ=−0.9 ρ=0537/562


The Bivariate Case

Conditional Distributions






SummarySummary


The Bivariate Case


Conditional Distributions: Discrete case

Let X ,Y be random variables with j.p.m.f.Pr (X = xi ,Y = yj).

The conditional probability of X given Y is:

Pr (X = xi |Y = yj) =Pr (X = xi ,Y = yj)

Pr (Y = yj).

If Pr (Y = yj) = 0, then we define Pr (X = xi |Y = yj) = 0.

Example: Let X ∼ POI(3), Y ∼ POI(2), and X and Y areindependent.

We have:

Pr(X = 2|Y = 3)=Pr(X = 2,Y = 3)

Pr(Y = 3)=

Pr(X = 2) · Pr(Y = 3)

Pr(Y = 3)=Pr(X = 2).

538/562


The Bivariate Case


Conditional Distributions: Continuous case

Let X ,Y be random variables with j.p.d.f. fX ,Y (x , y).

The conditional density of Y given X is:

fY |X (y |x) =fX ,Y (x , y)

fX (x).

If fX (x) = 0, then we define fY |X (y |x) = 0.

Example: consider the example from slide 509.

Question: Find fX |Y (x |y = 0.5)

Solution: fX |Y (x |y = 0.5) =fX ,Y (x ,0.5)fY (0.5) = 2x

1 = 2x .

539/562


The Bivariate Case


Application: an imperfect particle counter

Define the random variable N as the number of incomingclaims and X as claims paid. Probability of a fraudulent claimis q = 1− p and number of claims paid is Binomial:

(X |N = n) ∼ Binomial (n, p) .

If the number of incoming claims follows a Poissondistribution (with parameter λ) then the number of claimspaid turns out to also be Poisson with parameter λ · p. This isan example of “thinning” of a Poisson probability.

We will see more on thinning of a Poisson probability inACLT2003/5103 using Markov chains.

Proof: See next slides.

540/562


The Bivariate Case



Proof: the law of total probability (why can we apply ithere?) gives:

Pr (X = k) =∞∑n=0

Pr (X = k |N = n ) · Pr (N = n)

=∞∑n=k

(n

k

)· pk · (1− p)n−k · λ

n · e−λ

n!, since n ≥ k .

=∞∑n=k

n!

(n − k)! · k!· pk · (1− p)n−k · λ

n · e−λ

n!

continues on next slide.

541/562


The Bivariate Case



Now (making change of variables j = n − k in the third line):

=∞∑n=k

n!

(n − k)! · k!· pk · (1− p)n−k · λ

n · e−λ

n!

=(λ · p)k

k!· e−λ ·

∞∑n=k

λn−k · (1− p)n−k

(n − k)!

=(λ · p)k

k!· e−λ ·

∞∑j=0

(λ · (1− p))j

j!

∗=

(λ · p)k

k!· e−λ · eλ·(1−p) =

(λ · p)k

k!e−λ·p,

which is the p.m.f. of a Poisson(λ · p) random variable.* using exponential function exp(x) =

∑∞i=0 x

i/i !, withx = λ(1− p).

542/562


The Bivariate Case

The Bivariate Normal Distribution






SummarySummary


The Bivariate Case



Suppose [X ,Y ]> has a bivariate normal distribution, then itsdensity is given by:

fX ,Y (x , y) =1

2πσXσY√

1− ρ2exp

(− 1

2 (1− ρ2)A

),

where

A =

(x − µXσX

)2

− 2ρ

(x − µXσX

)·(y − µYσY

)+

(y − µYσY

)2

.

543/562


The Bivariate Case


The following results are important although quite tedious to show(see section 5.10 of W+(7ed) for some of the derivation):

1. The marginals are: X ∼ N(µX , σ

2X

)and Y ∼ N

(µY , σ

2Y

).

2. The conditional distributions are:

(Y |X = x ) ∼ N

(µY + ρ (x − µX )

σYσX

, σ2Y

(1− ρ2

))and

(X |Y = y ) ∼ N

(µX + ρ (y − µY )

σXσY

, σ2X

(1− ρ2

)).

3. The correlation coefficient between X and Y is: ρ (X ,Y ) = ρ.

544/562


The Bivariate Case


Simulating multivariate normal distribution

Bivariate case: use properties 1 & 2 to simulate from i.i.d.standard normal distributions:

X =µX + σXZ1

Y =µY + σY ρZ1 + σY

√(1− ρ2)Z2,

where Z1 and Z2 are i.i.d N(0, 1).

OPTIONAL: In case of multivariate normal, letZ = [Z1 . . .Zn]> i.i.d. N(0, 1), we have:

- The Cholesky decomposition: AA> = Σ (Σ is thevariance-covariance matrix).

- We have: X = µ+ AZ .

545/562


Laws

Law of Iterated Expectations






SummarySummary


Laws



Note: E[X |Y = y ] is a constant, but E[X |Y ] is a randomvariable.

For any two random variables X and Y , we have the law ofiterated expectations:

E [E [Y |X ]] = E [Y ] .

To prove this in the continuous case, first consider:

E [E [Y |X ]] =

∫ ∞−∞

E [Y |X = x ] · fX (x) dx

=

∫ ∞−∞

(∫ ∞−∞

y · fY |X (y |x ) dy

)fX (x) dx .

546/562


Laws


Interchanging order of integration, we have

E [E [Y |X ]] =

∫ ∞−∞

y

∫ ∞−∞

fY |X (y |x ) fX (x) dx︸︷︷︸=fY (y)

dy

∗=

∫ ∞−∞

y · fY (y) dy

=E [Y ]

* using the law of total probability (why can we use it here?).

547/562


Laws

Conditional variance identity






SummarySummary


Laws



Another important result is the conditional variance identity:

Var (Y ) = Var (E [Y |X ]) + E [Var (Y |X )] .

Proof (* using the law of iterative expectations):

Var (Y ) =E[Y 2]− (E [Y ])2

∗=E

[E[Y 2|X

]]− (E [E [Y |X ]])2

=E[E[Y 2|X

]]−E

[(E [Y |X ])2

]+ E

[(E [Y |X ])2

]− (E [E [Y |X ]])2

=E [Var (Y |X )] + Var (E [Y |X ]).

Proof can also be found in section 5.11 of W+(7ed).548/562


Laws

Application & Exercise






SummarySummary


Laws


Application: Random Sums

An insurance company usually has uncertainty in both thenumber of claims and the claim amount of each claim filled.

Denote the total claim size is S , individual claim size Xi andN is the total number of claims.

We are interested in the (distribution) mean and variance of arandom sum defined as:

S = X1 + X2 + . . .+ XN ,

where both the Xi ’s and N are random variables.

We assume all the Xi are independent and also independent ofN.

549/562


Laws


Application: Random SumsMean of S : The mean of the aggregate claims is:

E [S ] = E [Xi ] · E [N] .

This is straightforward:

E [S ] =E [E [S |N ]]

=E

[E

[N∑i=1

Xi |N

]]

=E

[N∑i=1

E [Xi |N]

]=E [N · E [Xi |N]]∗=E [E [Xi ]] · E [N] = E [Xi ] · E [N] .

* using independence Xi and N.550/562


Laws


Application: Random SumsVariance of S : The variance of the aggregate claims is:

Var (S) = (E [Xi ])2 · Var (N) + E [N] · Var (Xi ) .

This is also straightforward to show:

Var (S)∗=E [Var (S |N )] + Var (E [S |N ])

=E

[Var

(N∑i=1

Xi

)]+ Var (E [Xi ] · N)

∗∗=E [N] · E

Var (Xi )︸︷︷︸constant

+

E [Xi ]︸︷︷︸constant

2

· Var (N)

=E [N] · Var (Xi ) + (E [Xi ])2 · Var (N)

* using conditional variance identity, ** using independencebetween Xi and N.

551/562


Laws


Application: Random Sums

Moment Generating Function of S : The m.g.f. of theaggregate claims is given by:

MS (t) = MN (log (MX (t))) .

Finding the m.g.f. is also straightforward:

MS (t) =E[etS]

= E[E[etS∣∣∣N]]

=E[(MX (t))N

]= E

[eN·log(MX (t))

]=MN (log (MX (t))) .

Note that when the number of claims has a Poissondistribution, the resulting total claims S is said to have aCompound Poisson distribution.

552/562


Laws


ExerciseLet X ∼ Gamma(α, β) and Y |X ∼ EXP(1/X ).

a. Question: Find E [Y ].(Note: E [X ] = α/β, EXP(λ)=Gamma(1,λ))

b. Question: Find Var (Y ). (Note: Var (X ) = α/β2)

a. Solution:

E [Y ] =E [E [Y |X ]]

=E [X ] = α/β.

b. Solution:

Var (Y ) =Var (E [Y |X ]) + E [Var (Y |X )]

=Var(X ) + E[X 2]

=α/β2 + Var(X ) + (E [X ])2

=α/β2 + α/β2 + (α/β)2 =(2α + α2

)/β2.

553/562


The Multivariate Case

Introduction






SummarySummary



Introduction


Let X = [X1,X2, . . . ,Xn]> be a random vector with nelements. The joint distribution function (DF) of X isdenoted by:

FX1,X2,...,Xn (x1, . . . , xn) = Pr (X1 ≤ x1, . . . ,Xn ≤ xn) .

In the discrete case, we define the joint probability massfunction as:

pX1,X2,...,Xn (x1, . . . , xn) = Pr (X1 = x1, . . . ,Xn = xn) .

In the continuous case, we define the joint density function ofX as:

fX1,X2,...,Xn (x1, . . . , xn) =∂

∂x1. . .

∂

∂xnFX1,X2,...,Xn (x1, . . . , xn) .

554/562



Introduction

The joint DF is given by:

FX1,X2,...,Xn (x1, . . . , xn) =

∫ xn

−∞. . .

∫ x1

−∞fX1,X2,...,Xn (z1, . . . , zn) dz1 . . . dzn.

To derive marginal p.m.f.’s or densities, simply evaluate (sumor integrate) overall the region except for the variable ofinterest. For example in the continuous case, the marginaldensity of Xk , for k = 1, 2, . . . , n is given by:

fXk(xk) =

∫ ∞−∞

. . .

∫ ∞−∞

fX1,X2,...,Xn (z1, . . . , xk,. . . . , zn)∏j 6=k

dzj .

555/562



Introduction

Independent Random Variables

The random variables X1,X2, . . . ,Xn are said to beindependent if their joint distribution function can be writtenas the product of their marginal distribution functions:

FX1,X2,...,Xn (x1, . . . , xn) = FX1 (x1) · . . . · FXn (xn) .

As a consequence, their joint density can also be written as:

fX1,X2,...,Xn (x1, . . . , xn) = fX1 (x1) · . . . · fXn (xn) ,

in the continuous case and for the discrete case as:

pX1,X2,...,Xn (x1, . . . , xn) = pX1 (x1) · . . . · pXn (xn) .

556/562



Introduction

Also, we have (if independent):

E [X1 · X2 · . . . · Xn] = E [X1] · E [X2] · . . . · E [Xn] ,

and in general, (if independent) we have:

E [gX1 (X1) · gX2 (X2) · . . . · gXn (Xn)] =E [gX1 (X1)] · E [gX2 (X2)]

· . . . · E [gn (Xn)] .

557/562



Introduction

If X1 and X2 are independent, then:

1. Cov [X1,X2] = 0 and so ρ (X1,X2) = 0.

2. E [X1 |X2 ] = E [X1] and of course E [X2 |X1 ] = E [X2].

3. A very useful result about independence is that X1,X2, . . . ,Xn

are independent if and only if we can write the jointdistribution as a product of functions that involve only eachrandom variable:

FX1,X2,...,Xn (x1, . . . , xn) = HX1 (x1) · . . . · HXn (xn)

for some functions HX1 , . . . ,HXn .

558/562


Summarizing data

Exercises






SummarySummary


Summarizing data

Exercises

Exercise: summarizing data

An insurer assumes that the time between claims isexponential distributed. A reinsurer pays out when the insurerhas two or more claims within two years. The distribution ofinterest is Gamma(2,3).

Sorted observations:1.56 1.88 2.53 3.393.62 3.68 5.24 5.255.31 5.56 5.66 6.17

Questions: Find the

a. Median;b. Range;c. 10% trimmed mean;d. Inter quantile range.

Solutions:a. M = (3.68 + 5.24)/2 = 4.46;b. R = 6.17− 1.56 = 4.61;c. x̃0.10 =

x(2)+...+x(11)

10 = 4.212;d. Q1 = 2.53 + 0.75 · (3.39− 2.53) = 3.18

Q3 = 0.75 · 5.31 + 0.25 · 5.56 = 5.37IQR = 5.37− 3.18 = 2.19.559/562


Summarizing data

Exercises

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1E.c.d.f.

F n(x)

Colored lines: E.c.d.f.Black solid line: Gamma(2,3) c.d.f.Black dashed lines: Gamma(2,3) c.d.f. ±2σ

Question: Is the Gamma(2,3) the correct distribution?

Solution: Yes.560/562


Summary

Summary






SummarySummary


Summary

Summary

Summary joint probabilitiesJoint distribution function:

FX1,X2(x1, x2) = Pr (X1 ≤ x1,X2 ≤ x2) .

Marginal p.m.f.:

pX1 (x1i ) =∞∑j=1

pX1,X2 (x1i , x2j) .

Marginal density function:

fX1 (x1) =

∫ ∞−∞

fX1,X2 (x1, z2) dz2.

Conditional probability:

Pr (X = xi |Y = yj) =Pr (X = xi ,Y = yj)

Pr (Y = yj).

561/562


Summary

Summary

Summary joint probabilities

Covariance:

Cov (X1,X2) ≡ σ12 = E [X1 · X2]− E [X1] · E [X2] .

Correlation:

ρ (X1,X2) =Cov (X1,X2)√

Var (X1) · Var (X2).

Law of iterative expectations:

E [E [Y |X ]] = E [Y ] .

Conditional variance identity:

Var (Y ) = Var (E [Y |X ]) + E [Var (Y |X )] .562/562

Week 3 Annotated

Documents

vl week

xpx x x x

video lecture notesprobability

sample mean

variancenumerical methods

video lectures

xpx x x2

vlactl2002actl5101 probability