Top Banner
Faculty of Actuaries Institute of Actuaries EXAMINATION 12 April 2005 (am) Subject CT3 Probability and Mathematical Statistics Core Technical Time allowed: Three hours INSTRUCTIONS TO THE CANDIDATE 1. Enter all the candidate and examination details as requested on the front of your answer booklet. 2. You must not start writing your answers in the booklet until instructed to do so by the supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 13 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate. Graph paper is required for this paper. AT THE END OF THE EXAMINATION Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper. In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator. Faculty of Actuaries CT3 A2005 Institute of Actuaries
210
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

12 April 2005 (am)

Subject CT3

Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the supervisor.

3. Mark allocations are shown in brackets.

4. Attempt all 13 questions, beginning your answer to each question on a separate sheet.

5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

Faculty of Actuaries CT3 A2005 Institute of Actuaries

Page 2: ct32005-2010

CT3 A2005 2

1 Calculate the sample mean and standard deviation of the following claim amounts (£):

534 671 581 620 401 340 980 845 550 690 [3]

2 Suppose A, B and C are events with 12( ) =P A , 1

2( ) =P B , 13( ) =P C , 3

4( ) =P A B ,

16( ) =P A C , 1

6( ) =P B C and 112

( ) =P A B C .

(a) Determine whether or not the events A and B are independent. (b) Calculate the probability ( ).P A B C

[4]

3 Claim sizes in a certain insurance situation are modelled by a distribution with moment generating function M(t) given by

M(t) = (1 10t) 2.

Show that E[X 2] = 600 and find the value of E[X 3]. [3]

4 Consider a random sample of size 16 taken from a normal distribution with mean = 25 and variance 2 = 4. Let the sample mean be denoted X .

State the distribution of X and hence calculate the probability that X assumes a value greater than 26. [3]

5 Consider a random sample of size 21 taken from a normal distribution with mean = 25 and variance 2 = 4. Let the sample variance be denoted S2.

State the distribution of the statistic 5S2 and hence find the variance of the statistic S2. [3]

6 In a survey conducted by a mail order company a random sample of 200 customers yielded 172 who indicated that they were highly satisfied with the delivery time of their orders.

Calculate an approximate 95% confidence interval for the proportion of the company s customers who are highly satisfied with delivery times. [3]

7 For a group of policies the total number of claims arising in a year is modelled as a Poisson variable with mean 10. Each claim amount, in units of £100, is independently modelled as a gamma variable with parameters = 4 and = 1/5.

Calculate the mean and standard deviation of the total claim amount. [5]

Page 3: ct32005-2010

CT3 A2005 3 PLEASE TURN OVER

8 The distribution of claim size under a certain class of policy is modelled as a normal random variable, and previous years records indicate that the standard deviation is £120.

(i) Calculate the width of a 95% confidence interval for the mean claim size if a sample of size 100 is available. [2]

(ii) Determine the minimum sample size required to ensure that a 95% confidence interval for the mean claim size is of width at most £10 . [2]

(iii) Comment briefly on the comparison of the confidence intervals in (i) and (ii) with respect to widths and sample sizes used. [1]

[Total 5]

9 Let X1, , Xn denote a large random sample from a distribution with unknown

population mean and known standard deviation 3. The null hypothesis H0: = 1 is

to be tested against the alternative hypothesis H1: > 1, using a test based on the

sample mean with a critical region of the form X k , for a constant k.

It is required that the probability of rejecting H0 when = 0.8 should be

approximately 0.05, and the probability of not rejecting H0 when = 1.2 should be approximately 0.1.

(i) Show that the test requires

0.80.95

3/

k

n

and 1.2

0.103/

k

n

where is the standard normal distribution function. [4]

(ii) The values for the sample size n and the critical value k which satisfy the requirements of part (i) are n = 482 and k = 1.025 (you are not asked to verify these values).

Calculate the approximate level of significance of the test, and comment on the value. [3]

[Total 7]

Page 4: ct32005-2010

CT3 A2005 4

10 A model used for claim amounts (X, in units of £10,000) in certain circumstances has the following probability density function, f(x), and cumulative distribution function, F(x):

55

6

5(10 ) 10( ) = , 0 ; ( ) = 1

10(10 )f x x F x

xx.

You are given the information that the distribution of X has mean 2.5 units (£25,000) and standard deviation 3.23 units (£32,300).

(i) Describe briefly the nature of a model for claim sizes for which the standard deviation can be greater than the mean. [2]

(ii) (a) Show that we can obtain a simulated observation of X by calculating

0.2= 10 (1 ) 1x r

where r is an observation of a random variable which is uniformly distributed on (0,1).

(b) Explain why we can just as well use the formula

0.2= 10 1x r

to obtain a simulated observation of X.

(c) Calculate the missing values for the simulated claim amounts in the table below (which has been obtained using the method in (ii)(b) above):

r Claim (£)

0.7423 6,141 0.0291 10,2872 0.2770 29,272 0.5895 11,148 0.1131 54,635 0.9897 207 0.6875 7,782 0.8525 3,243 0.0016 ? 0.5154 ?

[5] [Total 7]

Page 5: ct32005-2010

CT3 A2005 5 PLEASE TURN OVER

11 Twenty insects were used in an experiment to examine the effect on their activity level, y, of 3 standard preparations of a chemical. The insects were randomly assigned, 4 to receive each of the preparations and 8 to remain untreated as controls. Their activity levels were metered from vibrations in a test chamber and were as follows:

Activity levels (y) Totals

Control 43 40 65 51 33 39 54 62 387 Preparation A 73 55 61 65 254 Preparation B 84 63 51 72 270 Preparation C 46 91 84 71 292

For these data 2= 1,203 , = 77,249y y .

(i) Conduct an analysis of variance test to establish whether the data indicate significant differences amongst the results for the four treatments. [7]

(ii) (a) Complete the following table of residuals for the data and analysis in part (i) above:

Control ? ? 16.6 2.6 15.4 9.4 5.6 13.6 Preparation A 9.5 8.5 2.5 1.5 Preparation B 16.5 ? 16.5 4.5 Preparation C ? ? ? 2

(b) Make a rough plot of the residuals against the treatment means.

(c) State the assumptions underlying the analysis of variance test conducted in part (i).

(d) Comment on how well the data conform to these assumptions in the light of the residual plot. [8]

(iii) It is suggested that any differences can be explained in terms of a difference between controls on the one hand and treated groups on the other.

Comment on any evidence for this and state how you would formally test for this effect (but do not carry out the test). [4]

[Total 19]

Page 6: ct32005-2010

CT3 A2005 6

12 (i) A random variable Y has a Poisson distribution with parameter

but there is a

restriction that zero counts cannot occur. The distribution of Y in this case is referred to as the zero-truncated Poisson distribution.

(a) Show that the probability function of Y is given by

( ) = ( = 1, 2, ).!(1 )

yep y y

y e

(b) Show that [ ] = /(1 ).E Y e

[4]

(ii) (a) Let y1, , yn denote a random sample from the zero-truncated Poisson distribution.

Show that the maximum likelihood estimate of

may be determined by the solution to the following equation:

= 0,1

ey

e

and deduce that the maximum likelihood estimate is the same as the method of moments estimate.

(b) Obtain an expression for the Cramer-Rao lower bound (CRlb) for the variance of an unbiased estimator of .

[9]

(iii) The following table gives the numbers of occupants in 2,423 cars observed on a road junction during a certain time period on a weekday morning.

Number of occupants 1

2

3

4

5

6

Frequency of cars 1,486

694

195

37

10

1

The above data were modelled by a zero-truncated Poisson distribution as given in (i).

The maximum likelihood estimate of is = 0.8925 and the Cramer-Rao

lower bound on variance at = 0.8925 is 5.711574 10 4 (you do not need to verify these results.)

(a) Obtain the expected frequencies for the fitted model, and use a 2

goodness-of-fit test to show that the model is appropriate for the data.

(b) Calculate an approximate 95% confidence interval for and hence calculate a 95% confidence interval for the mean of the zero-truncated Poisson distribution.

[9] [Total 22]

Page 7: ct32005-2010

CT3 A2005 7

13 As part of an investigation into health service funding a working party was concerned with the issue of whether mortality rates could be used to predict sickness rates. Data on standardised mortality rates and standardised sickness rates were collected for a sample of 10 regions and are shown in the table below:

Region Mortality rate m (per 10,000) Sickness rate s (per 1,000)

1 125.2 206.8 2 119.3 213.8 3 125.3 197.2 4 111.7 200.6 5 117.3 189.1 6 100.7 183.6 7 108.8 181.2 8 102.0 168.2 9 104.7 165.2

10 121.1 228.5

Data summaries:

m = 1136.1, m2 = 129,853.03, s = 1934.2, s2 = 377,700.62, ms = 221,022.58

(i) Calculate the correlation coefficient between the mortality rates and the sickness rates and determine the probability-value for testing whether the underlying correlation coefficient is zero against the alternative that it is positive. [4]

(ii) Noting the issue under investigation, draw an appropriate scatterplot for these data and comment on the relationship between the two rates. [3]

(iii) Determine the fitted linear regression of sickness rate on mortality rate and test whether the underlying slope coefficient can be considered to be as large as 2.0. [5]

(iv) For a region with mortality rate 115.0, estimate the expected sickness rate and calculate 95% confidence limits for this expected rate. [4]

[Total 16]

END OF PAPER

Page 8: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

April 2005

Subject CT3

Probability and Mathematical Statistics Core Technical

EXAMINERS REPORT

Introduction

The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable.

M Flaherty

Chairman of the Board of Examiners

15 June

2005

Faculty of Actuaries Institute of Actuaries

Page 9: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 2

1 x = 6212 , x2 = 4186784

6212

10x = £621.20

s21 6212 327889.6

41867849 10 9

= £190.87

2 (a) 31 1 12 2 4 4

( ) ( ) ( ) ( )P A B P A P B P A B

1 1 12 2 4

( ) ( ) ( )( )P A P B so the events A and B are independent as

( ) ( ) ( ).P A B P A P B

(b) ( ) ( ) ( ) ( ) ( ) ( ) ( )P A B C P A P B P C P A B P A C P B C

( )P A B C

1 1 1 1 1 1 1 5

2 2 3 4 6 6 12 6

[OR ( ) ( ) ( ) ( ) ( ) ( )P A B C P A B P C P A C P B C P A B C

3 51 1 1 14 3 6 6 12 6

.]

[OR Use a Venn diagram]

3 M(t) = (1 10t) 2

M (t) = ( 2)( 10)(1 10t) 3 = 20(1

10t) 3

M

(t) = ( 60)( 10)(1 10t) 4 = 600(1 10t) 4 Putting t = 0 E[X2] = 600 M (t) = ( 2400)( 10)(1 10t) 5 = 24000(1 10t) 5 Putting t = 0 E[X3] = 24000

[OR use the power series expansion M(t) = 1 + 20t + 600t2/2! + 24000t3/3! + ] [OR use the result on E[X r] for a gamma(2,0.1) variable in the Yellow Book]

4 ~ 25,0.25X N

26 2526 2 1 0.97725 0.02275

0.5P X P Z P Z

Page 10: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 3

5 2

2 2202

15 ~

n SS

V[5S2] = variance of 2

20 = 40, so V[S2] = 40/25 = 1.6

6 1720.86

200

xp

n

95% CI is (1 )

1.96p p

pn

0.86 1.96(0.0245) 0.86 0.048 or (0.812, 0.908)

7 Let S = X1 + X2 + . . . + XN be the total claim amount.

Note that E[N] = V[N] = 10, E[X] = 4/(1/5) = 20, V[X] = 4/(1/5)2 = 100

E[S] = E[N]E[X] = 10(20) = 200

mean of total claim amount = £20,000.

V[S] = E[N]V[X] + V[N]{E[X]}2

=10(100) + 10(20)2 = 5000

s.d.[S] = 70.71 s.d. of the total claim amount = £7,071

8 (i) Width of 95% confidence interval: 120

1.96n

[or 120 120

2 1.96 3.92n n

]

1201.96 23.52

100

[or 120

2 1.96100

£47.04]

(ii) For the width of a 95% confidence interval to be at most 10 we require 120

1.96 10n

1.96 12023.52

10n , n 553.19

i.e., take the sample size as 554.

Page 11: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 4

(iii) The confidence interval in (ii) in narrower to achieve this we require a much larger sample size.

9 (i) X approx

23

,Nn

for large n by the central limit theorem.

0.05

P(reject H0| = 0.8) = P( X > k| = 0.8)

= 0.8

13/

k

n

0.80.95

3/

k

n

and

0.1

P(do not reject H0| = 1.2)

= P ( | 1.2)X k

= 1.2

3/

k

n

(ii) Significance level = P(reject H0 when H0 is true) = ( | 1)P X k

1.025 11 1 (0.18) 1 0.57 0.43.

3 / 482

The significance level of the test is very high (43%).

10 (i) X takes positive values only so to have such a relatively high standard deviation the distribution must be positively skewed with sizeable probability associated with high values (i.e. the model embraces high claim sizes; the density has a long or heavy tail).

(ii) (a) Solving r = F(x) (1 + x/10) = (1

r) 0.2

x = 10[(1

r) 0.2 1]

(b) R ~ U(0,1) 1

R ~ U(0, 1) so (1

r) is also a random number from (0, 1), so we can use 1 r in place of r , giving the formula

0.210 1x r

(c) r = 0.0016 claim = 262390 r = 0.5154 claim = 14175

Page 12: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 5

11 (i) SST = 77249 12032/20 = 4888.55

SSB = 3872/8 + 2542/4 + 2702/4 + 2922/4 12032/20 = 2030.675

SSR = 2857.875

H0: no treatment effects (i.e. population means are equal) v H1: not H0

Analysis of Variance Source DF SS MS F Factor 3 2031 677 3.79 Error 16 2858 179 Total 19 4889

F3,16(0.05) = 3.239, F3,16(0.01) = 5.292

P-value is lower than 0.05 (but higher than 0.01), so we can reject H0 at least at the 5% level of testing (Note: actually P-value is 0.032). The data do indicate significant differences amongst the treatment means.

(ii) (a) Residual = observed value treatment mean Treatment means are: Control 48.375, A 63.5, B 67.5, C 73.0

Missing values are:

Control 5.4 8.4 16.6 2.6 15.4 9.4 5.6 13.6 Preparation A 9.5 8.5 2.5 1.5 Preparation B 16.5 4.5 16.5 4.5 Preparation C 27 18 11 2

(b)

706050

20

10

0

-10

-20

-30

means

resi

ds

Control A B Ctreatment

Page 13: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 6

(c) Observations Yij (jth value for treatment i) are independent and

normally distributed with variance 2 which is constant across treatments.

(d) The assumptions seem reasonable with the exception of the constant variance assumption, which is questionable the data for preparation A appear to be less variable than the data for the other treatments.

(iii) The control mean is lower than all three treatment means (48.4 v 63.5, 67.5, 73.0) so there is prima facie evidence to support the suggestion.

One could perform a two-sample t-test of control mean = treatment mean by combining the data for the 3 preparations (and using samples of sizes 8 and 12).

12 (i) (a) The probability function for the zero-truncated Poisson distribution is given by

( and 0)( | 0)

( 0)

P Y y YP Y y Y

P Y

!(1 ( 0))

ye

y P Y

( 1, 2, ).!(1 )

yey

y e

(b) Expectation of Y:

1

[ ]!(1 )

y

y

eE Y y

y e

0

( 1)!(1 )

z

z

ez y

ze

[1](1 ) (1 )e e

.

Page 14: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 7

(ii) (a) The log likelihood function for

is:

1

log ( ) log log(1 ) constantn

ii

L y n n e

log ( )

1

d L ny en n

d e

the ML estimate is determined by the solution of the equation

log ( )0 0

1

d L ey

d e

As this equation may be rewritten as

and [ ]1 1

y E Ye e

the ML estimate is the same as the method of moments estimate.

(b) 2

2 2 2log ( )

(1 )

d ny eL n

d e

and since [ ] [ ] ,(1 )

E Y E Ye

the Cramer-Rao lower bound is

given by,

2

2 2

1 1lb

1log ( )

(1 ) (1 )

CRd e

E L nd e e

2(1 )or .

(1 )

e

n e e

Page 15: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 8

(iii) (a) The expected frequencies for the fitted zero-truncated Poisson model are given by

( 1, 2, ) where 0.8925 and 2423!(1 )

yen y n

y e

y 1 2 3 4 5 6 Total ei 1500.48 669.59 199.20 44.45 7.93 1.35 2423.00 fi 1486 694 195 37 10 1 2423

22 ( )i i

i

f e

e =

2 2(1486 1500.48) (1 1.35)

1500.48 1.35 = 2.99

(on 4 df).

The Yellow Book gives that the probability value is greater than 50%, therefore there is no evidence to reject the null hypothesis, i.e. the model seems appropriate for the data.

[OR 2 = 2.68 on 3 df if 5 combined rather than 6.]

(b) As

approx. ~ ( , lb)N CR for large n, a 95% confidence interval for

is given by

1.96 lbCR

40.8925 1.96 5.711574 10 , since lbCR 5.711574 10-4

at = 0.8925,

= 0.8925 1.96(0.0238989) = 0.8925 0.04684 = (0.84566, 0.93934) 1

Then the 95% confidence interval for the mean of ,1

Ye

, is given

by

0.84566 0.93934

0.84566 0.93934,

1 1e e

= (1.48, 1.54).

Page 16: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 9

13 (i) Smm = 129853.03 (1136.1)2/10 = 780.709

Sss = 377700.62 (1934.2)2/10 = 3587.656

Sms = 221022.58 (1136.1)(1934.2)/10 = 1278.118

1278.1180.764

(780.709)(3587.656)r

H0: = 0 v. H1: > 0

2

83.35

1

rt

r Prob-value = P(t8 > 3.35) = 0.005 from tables.

[OR use Fisher s transformation]

(ii) Given the issue of whether mortality can be used to predict sickness, we require a plot of sickness against mortality:

There seems to be an increasing linear relationship such that mortality could be used to predict sickness.

Page 17: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2005

Examiners Report

Page 10

(iii) 1278.118

1.6371780.709

and 1

[1934.2 (1136.1)] 7.42610

2

2 1 (1278.118){3587.656 } 186.902

8 780.709

2

[ ] 0.2394780.709

Var

Test H0: = 2 v. H1: < 2

1.6371 20.74

0.2394t on 8 df

Prob-value large; no evidence to reject H0: = 2 So we can accept that the slope is as large as 2.

(iv) For a region with m = 115:

estimated expected s = 7.426 + 1.6371(115) = 195.69

with variance = 2

2 1 (115 113.61){ } 19.152810 780.709

95% confidence limits are:

195.69

t8(s.e.)

195.69 2.306(4.376) 195.69 10.09 or (185.60, 205.78)

END OF EXAMINERS REPORT

Page 18: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

13 September 2005 (am)

Subject CT3

Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the supervisor.

3. Mark allocations are shown in brackets.

4. Attempt all 14 questions, beginning your answer to each question on a separate sheet.

5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

Faculty of Actuaries CT3 S2005 Institute of Actuaries

Page 19: ct32005-2010

CT3 S2005 2

1 The table below gives the number of thunderstorms reported in a particular summer month by 100 meteorological stations.

Number of thunderstorms: 0

1

2

3

4

5

Number of stations: 22

37

20

13

6

2

(a) Calculate the sample mean number of thunderstorms.

(b) Calculate the sample median number of thunderstorms.

(c) Comment briefly on the comparison of the mean and the median. [3]

2 In an opinion poll, each individual in a random sample of 275 individuals from a large population is asked which political party he/she supports. If 45% of the population support party A, calculate (approximately) the probability that at least 116 of the sample support A. [3]

3 Claim amounts on a certain type of policy are modelled as following a gamma distribution with parameters = 120 and = 1.2.

Calculate an approximate value for the probability that an individual claim amount exceeds 120, giving a reason for the approach you use. [3]

4 Calculate a 99% confidence interval for the percentage of claims for household accidental damage which are fully settled within six months of being submitted, given that in a random sample of 100 submitted claims of this type, exactly 83 were fully settled within six months of being submitted. [3]

5 A random sample of 500 claim amounts resulted in a mean of £237 and a standard deviation of £137.

Calculate an approximate 95% confidence interval for the true underlying mean claim amount for such claims, explaining why the normal distribution can be used.

[3]

6 A random sample of 16 observations was selected from each of four populations. Given that the between treatments mean square is 280 and the total sum of squares is SST = 1,500, construct an analysis of variance table and test the null hypothesis that the means of the four populations are equal. [3]

Page 20: ct32005-2010

CT3 S2005 3 PLEASE TURN OVER

7 A sample of 20 claim amounts (£) on a group of household policies gave the following data summaries:

x = 3,256 and x2 = 866,600.

(a) Calculate the sample mean and standard deviation for these claim amounts.

(b) Comment on the skewness of the distribution of these claim amounts, giving reasons for your answer. [4]

8 A simple procedure for incorporating a no claims discount into an annual insurance policy is as follows:

a premium of £400 is payable for the first year;

if no claim is made in the first year, the premium for the second year is £400k, where k is a constant such that 0 < k < 1;

if no claim is made in the first and second years, the premium for the third year is £400k2;

if no subsequent claims are made in future years, the premium remains as £400k2;

if a claim is made, the premium the following year reverts to £400 and the procedure starts again as above.

(i) Show that the probability distribution of the premium for the fourth year, that is, for the year following the third year, is given by

£400 with probability p £400k with probability p(1

p) £400k2 with probability (1

p)2

where p is the probability of a claim being made in any year. [3]

(ii) Obtain an expression for the expected premium for the fourth year under this procedure. [1]

(iii) If it is desired that this expected premium should equal £300, determine the required value of k for the case where p = 0.1. [3]

[Total 7]

Page 21: ct32005-2010

CT3 S2005 4

9 Random samples are taken from two independent normally distributed populations (with means 1 and 2 respectively) with the following results.

Sample from population 1:

sample size n1 = 11, sample mean 1 124x , sample variance 21 59s

Sample from population 2:

sample size n2 = 15, sample mean 2 105x , sample variance 22 42s

Calculate a 95% confidence interval for 1

2, the difference between the population means (you may assume that the population variances are equal).

[5]

10 Let X denote a random variable with a continuous uniform (0, 1000) distribution. Define a random variable Y as the minimum of X and 800.

(i) Show that the conditional distribution of X given X < 800 is a continuous uniform (0, 800) distribution. [2]

(ii) Verify (giving clear reasons) that the expectation of the random variable Y is 480. [3]

(iii) Suppose that Y1, , Yn are independent and identically distributed, each with the same distribution as Y.

In the case that n is large, determine the approximate distribution of

1

1=

nii

Y Yn

, stating its expectation. (You are not required to derive or state

the variance of Y .) [1]

(iv) Comment on the comparison of the conditional expectation of X given X < 800 with the expectation of Y. [2]

[Total 8]

Page 22: ct32005-2010

CT3 S2005 5 PLEASE TURN OVER

11 Consider the following simple model for the number of claims, N, which occur in a year on a policy:

n

0 1 2 3

P(N = n)

0.55 0.25 0.15 0.05

(a) Explain how you would simulate an observation of N using a number r, an

observation of a random variable which is uniformly distributed on (0, 1).

(b) Illustrate your method described in (i) by simulating three observations of N using the following random numbers between 0 and 1:

0.6221, 0.1472, 0.9862. [4]

12 A certain type of insurance policy has a claim rate of per year and the cover ceases and the policy expires after the first claim. Accordingly the duration of a policy is

modelled by an exponential distribution with density function : 0xe x .

A company has data on (m + n) policies which have expired and which may be assumed to be independent. Of these, m policies had duration less than 5 years and n policies had duration greater than or equal to 5 years.

(i) An investigator makes note of the actual durations, x1, , xn, of the latter group of n policies, but ignores the former group without even noting the value of m.

(a) Explain why the xi s come from a truncated exponential distribution with density function

( ) = . , 5xf x k e x

and show that 5k e .

(b) Write down the likelihood for the data from the point of view of this investigator and hence show that the maximum likelihood estimate (MLE) of is given by

1

5n

ii

n

x n

.

(c) The data yield the values: n = 10 and xi = 71. Calculate this investigator s MLE of .

[8]

Page 23: ct32005-2010

CT3 S2005 6

(ii) A second investigator ignores the actual policy durations and simply notes the values of m and n.

(a) Write down the likelihood for this information and hence show that the resulting MLE of is given by

1= log

5

m n

n.

(b) The same data as in part (i) yield the values: m = 120 and n = 10. Calculate this investigator s MLE of .

[5]

(iii) The two investigators decide to pool their data, and so have the information that there are m policies with duration less than 5 years, and n policies with actual durations x1, ... , xn.

(a) Explain why the likelihood for this joint information is given by

5

1

( ) = (1 ) . i

nxm

i

L e e

and determine an equation, the solution of which will lead to the MLE of .

(b) Given that this leads to an MLE of equal to 0.508, comment on the comparison of the three MLE s.

[5] [Total 18]

13 A survey, carried out at a major flower and gardening show, was concerned with the association between the intention to return to the show next year and the purchase of goods at this year s show. There were 220 people interviewed and of these 101 had made a purchase; 69 of these people said they intended to return next year. Of the 119 who had not made a purchase, 68 said they intended to return next year.

(i) Suppose one of the 220 people surveyed is selected at random.

Calculate the probabilities that the selected person:

(a) intends to return next year, given that he/she has made a purchase (b) intends to return next year, given that he/she has not made a purchase (c) has made a purchase, given that he/she intends to return next year

[3]

(ii) By testing the difference between the proportions of purchasers and non-purchasers who intend to return next year, examine whether there is sufficient evidence to justify concluding that the intention to return depends on whether or not a purchase was made. [7]

Page 24: ct32005-2010

CT3 S2005 7 PLEASE TURN OVER

(iii) Present the data as a contingency table and perform a 2 test of association between the attributes intention to return and purchasing status . [5]

(iv) Discuss briefly the connection between the comparison of proportions carried out in part (ii) and the test of association performed in part (iii). [2]

[Total 17]

14 The data given in the following table are the numbers of deaths from AIDS in Australia for 12 consecutive quarters starting from the second quarter of 1983.

Quarter (i): 1

2

3

4

5

6

7

8

9

10

11

12

Number of deaths (ni): 1

2

3

1

4

9

18

23

31

20

25

37

(i) (a) Draw a scatterplot of the data.

(b) Comment on the nature of the relationship between the number of deaths and the quarter in this early phase of the epidemic.

[4]

(ii) A statistician has suggested that a model of the form

2=iE N i

might be appropriate for these data, where is a parameter to be estimated from the above data. She has proposed two methods for estimating , and these are given in parts (a) and (b) below.

(a) Show that the least squares estimate of , obtained by minimising 12 2 2

1= ( )ii

q n i , is given by

12 21

12 41

=ii

i

i n

i.

(b) Show that an alternative (weighted) least squares estimate of ,

obtained by minimising

2212

21* =

i

i

n iq

i is given by

12

112 2

1

= .ii

i

n

i

(c) Noting that 12 4

1= 60,710

ii and

12 21

= 650i

i , calculate and

for the above data. [8]

Page 25: ct32005-2010

CT3 S2005 8

(iii) To assess whether the single parameter model which was used in part (ii) is appropriate for the data, a two parameter model is now considered. The model is of the form

=iE N i

for i = 1, , 12.

(a) To estimate the parameters and , a simple linear regression model

=i iE Y x

is used, where xi = log(i) and Yi = log(Ni) for i = 1, , 12. Relate the

parameters

and to the regression parameters and .

(b) The least squares estimates of

and

are 0.6112 and 1.6008 with standard errors 0.4586 and 0.2525 respectively (you are not asked to verify these results).

Using the value for the estimate of , conduct a formal statistical test to assess whether the form of the model suggested in (ii) is adequate.

[7] [Total 19]

END OF PAPER

Page 26: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

September 2005

Subject CT3

Probability and Mathematical Statistics Core Technical

EXAMINERS REPORT

Faculty of Actuaries Institute of Actuaries

Page 27: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical)

September 2005

Examiners Report

Page 2

1 (a) 150

1.5100

x

(b) Median is (101/2)th observation i.e. the mean of the 50th and 51st observations

so median = 1

(c) Mean is higher than the median as the data are positively skewed (skewed to the right).

2 Let X denote the number in the sample who support party A.

X ~ Binomial(275, 0.45)

E[X] = 275 0.45 = 123.75 V[X] = 275 0.45 0.55 = 68.0625

The normal approximation to the binomial gives, using a continuity correction,

115.5 123.75( 116) ( 115.5) 1 1 ( 1) (1) 0.841

68.0625P X P X

3 Gamma(120, 1.2) has mean 120

1001.2

and variance 2

12083.333

1.2

2(100,9.129 )X N by the Central Limit theorem (since the gamma variable is the sum of 120 independent gamma(1,1.2) variables)

120 100

120 2.191 1 0.98578 0.01429.129

P X P Z

4 Sample proportion = 0.83

99% CI for the population proportion is 0.83 [2.5758 (0.83 0.17/100)1/2] i.e. 0.83 0.0968 i.e. (0.733, 0.927)

99% CI for percentage is thus (73.3% , 92.7%)

Page 28: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) September 2005

Examiners Report

Page 3

5 n = 500 is very large, so the Central Limit Theorem justifies normality.

95% CI is 1.96s

xn

137

237 1.96 237 12.0500

or (£225.0, £249.0)

6 Source of variation d.f.

SS

MSS

Between groups 3

840

280

Residual 60

660

11

Total 63

1500

F = 280/11 which equals 25.45.

Therefore there is overwhelming evidence to suggest that the four population means are not equal (F3,60(5%) = 2.758, F3,60(1%) = 4.126).

7 (a) 1

(3256) 162.820

x

22 1 3256

866600 17711.7 133.119 20

s s

(b) Distribution must have strong positive skewness since the s.d. is large relative to the mean and the amounts must be positive.

8 (i) Using C for a claim, N for no claim, then

P(premium = 400) = P(C in year 3, regardless of the first 2 years) = p

P(premium = 400k) = P(CN in years 2/3, regardless of the first year) = p(1 p)

P(premium = 400k2) = P(NN in years 2/3, regardless of the first year) = (1 p)2

[These probabilities may be derived in other ways, such as via a tree diagram]

Page 29: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical)

September 2005

Examiners Report

Page 4

(ii) E(premium) = 400.p + 400k.p(1

p) + 400k2.(1

p)2

= 400{p + kp(1 p) + k2(1

p)2}

(iii) For E(premium) = 300 when p = 0.1 then 0.1 + 0.09k + 0.81k2 = 0.75 0.81k2 + 0.09k 0.65 = 0

20.09 0.09 4(0.81)(0.65) 0.09 1.454

1.62 1.62k for 0 1k

0.84k

9 Pooled estimate of common population variance = 10 59 14 42

49.083310 14

t24 (0.025) = 2.064

95% CI for 1

2 is given by 1/ 2

1 1124 105 2.064 49.0833 i.e. 19 5.74 i.e. 13.3 , 24.7

11 15

10 X ~ U(0,1000) , Y = min(X,800)

(i) ( and 800)

( | 800)( 800)

P X x XP X x X

P X

( )

800 /1000

P X x for 0 < x < 800

/1000for 0 800

800 /1000 800

x xx

so the conditional distribution is U(0,800)

[other reasonable arguments were given credit, e.g. the conditional distribution is simply a scaled version of the original uniform distribution on a restricted range .]

(ii) E[Y] = E[X|X < 800] P(X < 800) + 800 P(X 800)

800 200400 800

1000 1000

= 480

Page 30: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) September 2005

Examiners Report

Page 5

(iii) Y is approximately normal with expectation 480 by Central Limit Theorem

(iv) E[X | X < 800] = 400 whereas E[Y] = 480.

The higher value for E[Y] results from 20% of the Y values being 800 (and 80% being between 0 and 800).

11 (a) For 0

r < 0.55

n = 0 0.55

r < 0.8

n = 1 0.8

r < 0.95

n = 2 0.95

r 1

n = 3

[OR any equivalent allocation which reflects the probabilities of the 4 values of N.]

(b) 0.6221

n = 1 0.1472

n = 0 0.9862

n = 3

12 (i) (a) The xi s are known to be such that 5ix , therefore have density which

is a scaled form of xe for 5 < x < .

The scaling constant k is such that 5

. 1xk e dx

5 55[ ] 1 . 1xk e k e k e

[Note: this can be argued in other ways; e.g. by referring to a conditional density and dividing by P(X > 5)]

Page 31: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical)

September 2005

Examiners Report

Page 6

(b) 5 5

1

( ) i i

nx xn n

i

L e e e e

log ( ) log 5 iL n n x

log ( ) 5 id n

L n xd

equate to zero for MLE

1

5n

ii

n

x n

[OR It could be noted that 5 ~ exp( )X and that the MLE is

therefore the reciprocal of the mean of the data ( 5)ix giving the

required answer]

(c) n = 10, xi = 71 10

0.47671 50

(ii) (a) 5 5( ) (1 ) ( )m nL e e

5 5log ( ) log(1 ) log( )L m e n e

5

5

5log ( ) 5

1

d meL n

d e

equate to zero for MLE 5

51

e n

me

5 1log( )

5

n m ne

m n n

[OR Reason via the MLE for a binomial p = P(X > 5) such that n

pm n

and 5p e ]

(b) m = 120, n = 10 0.513

Page 32: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) September 2005

Examiners Report

Page 7

(iii) (a) 5(1 )me is the likelihood of observing m policies with duration < 5

1

i

nx

i

e is the likelihood of observing the actual durations x1, , xn

and independence leads to the product of these

5log ( ) log(1 ) log iL m e n x

5

5

5log ( )

1i

d me nL x

d e

equate to zero and the solution gives the MLE.

(b) All three are re-assuringly close.

The pooled estimate is between the first two (as expected, but it is closer to 0.513).

13 purchase no purchase

return 69 68 137

no return 32 51 83

101 119 220

(i) (a) 69/101 (= 0.6832) (b) 68/119 (= 0.5714) (c) 69/137 (= 0.5036)

(ii) H0: population proportions of those who intend to return are equal v H1: not H0

Proportion of purchasers 1 69 /101; proportion of non-purchasers

2 68 /119

Under H0, estimate of common proportion who intend to return = 137/220

Observed value of 1 2 0.1117D

Estimated standard error of D = 1/ 2

137 83 1 10.06558

220 220 101 119

Page 33: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical)

September 2005

Examiners Report

Page 8

P-value = 2

P(D > 0.1117) = 2

P(Z > 0.1117/0.06558) = 2

P(Z > 1.70)

= 2 (1 0.95543) = 0.08914 (i.e. 8.9%)

There is not sufficient evidence (using a two-sided test) to justify rejecting H0

i.e. there is not sufficient evidence to justify concluding that the intention to return depends on whether or not a purchase was made.

(iii) H0: no association between attributes v H1: not H0

Expected frequencies under H0 in brackets:

purchase no purchase return 69 (62.9) 68 (74.1) 137

no return 32 (38.1) 51 (44.9) 83

101 119 220

Test statistic = 2 1 1 1 16.1 2.90

62.9 74.1 38.1 44.9

P-value = 21 2.90 1 0.9114 0.0886P

(i.e. 8.9%)

There is not sufficient evidence to justify rejecting H0 i.e. there is not sufficient evidence to justify concluding that the intention to return is associated with purchasing status.

(iv) The two approaches complement each other: the P-values are the same the conclusions are the same.

[Note: there is a formal connection: the 21 value in (iii) (2.90) is the square of

the z value in (ii) (1.70).]

Page 34: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) September 2005

Examiners Report

Page 9

14 (i) (a) Points are shown on scatterplot.

(b) The mean number of deaths increases with an increasing rate with quarter.

The variance also appears to increase with quarter.

(ii) (a) 2 2( )iq n i

2 22 ( )idq

i n id

2 20 2 ( ) 0idq

i n id

2 4 0ii n i

2

4.ii n

i

(2

42

2 0d q

id

minimum.)

1050

40

30

20

10

0

Quarter

Num

ber

of d

eath

s

Page 35: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical)

September 2005

Examiners Report

Page 10

(b) 22 2

*2

( )i in i nq i

ii

*

2 indqi i

d i

*20 0i

dqn i

d

2in

i

(2 *

22

2 0d q

id

minimum.)

(c) 2

4

156940.259

60710ii n

i

2

1740.268

650in

i

(iii) (a) iE N i

Taking logs gives

log log( )iE N i = log log( ) log ii x

Thus = log

and = .

[OR e and .]

(b) = 1.6008 s.e.( ) = 0.2525

H0:

= 2 v H1:

2

2 1.6008 21.58

0.2525s.e.( )t

Compare with a t-distribution with 10 d.f.

Page 36: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) September 2005

Examiners Report

Page 11

As the 5% critical value of a two-tailed test is 2.228, do not reject the null hypothesis.

Therefore, the model used in (ii) with = = 2 seems appropriate.

END OF EXAMINERS REPORT

Page 37: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

30 March 2006 (am)

Subject CT3

Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the supervisor.

3. Mark allocations are shown in brackets.

4. Attempt all 12 questions, beginning your answer to each question on a separate sheet.

5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

Faculty of Actuaries CT3 A2006 Institute of Actuaries

Page 38: ct32005-2010

CT3 A2006 2

1 The stem and leaf plot below gives the surrender values (to the nearest 1,000) of 40 endowment policies issued in France and recently purchased by a dealer in such policies in Paris. The stem unit is 10,000 and the leaf unit is 1,000.

5 3 5 6 6 02 6 5779 7 122344 7 556677899 8 1123444 8 567778 9 024 9 6

Determine the median surrender value for this batch of policies. [2]

2 In a certain large population 45% of people have blood group A. A random sample of 300 individuals is chosen from this population.

Calculate an approximate value for the probability that more than 115 of the sample have blood group A. [3]

3 A random sample of size 10 is taken from a normal distribution with mean

= 20 and variance 2 = 1.

Find the probability that the sample variance exceeds 1, that is find P(S2 > 1). [3]

4 In a one-way analysis of variance, in which samples of 10 claim amounts (£) from each of three different policy types are being compared, the following means were calculated:

1 2 3276.7 , 254.6 , 296.3y y y

with residual sum of squares SSR given by

3 102

1 1

( ) 15,508.6R ij ii j

SS y y

Calculate estimates for each of the parameters in the usual mathematical model, that

is, calculate 1 2 3, , , , and 2 . [4]

Page 39: ct32005-2010

CT3 A2006 3 PLEASE TURN OVER

5 A large portfolio of policies is such that a proportion p (0 < p < 1) incurred claims during the last calendar year. An investigator examines a randomly selected group of 25 policies from the portfolio.

(i) Use a Poisson approximation to the binomial distribution to calculate an approximate value for the probability that there are at most 4 policies with claims in the two cases where (a) p = 0.1 and (b) p = 0.2. [3]

(ii) Comment briefly on the above approximations, given that the exact values of the probabilities in part (i), using the binomial distribution, are 0.9020 and 0.4207 respectively. [2]

[Total 5]

6 One variable of interest, T, in the description of a physical process can be modelled as T = XY where X and Y are random variables such that X ~ N(200, 100) and Y depends on X in such a way that Y|X = x ~ N(x, 1).

Simulate two observations of T, using the following pairs of random numbers (observations of a uniform (0, 1) random variable), explaining your method and calculations clearly:

Random numbers

0.5714 , 0.8238

0.3192 , 0.6844

[6]

7 Let (X1, X2 , , Xn) be a random sample from a uniform distribution on the interval

( , ), where is an unknown positive number.

A particular sample of size 5 gives values 0.87, 0.43, 0.12, 0.92, and 0.58.

(i) Draw a rough graph of the likelihood function L( ) against

for this sample. [3]

(ii) State the value of the maximum likelihood estimate of . [2] [Total 5]

Page 40: ct32005-2010

CT3 A2006 4

8 The events that lead to potential claims on a policy arise as a Poisson process at a rate of 0.8 per year. However the policy is limited such that only the first three claims in any one year are paid.

(i) Determine the probabilities of 0, 1, 2 and 3 claims being paid in a particular year. [2]

(ii) The amounts (in units of £100) for the claims paid follow a gamma distribution with parameters = 2 and = 1.

Calculate the expectation of the sum of the amounts for the claims paid in a particular year. [3]

(iii) Calculate the expectation of the sum of the amounts for the claims paid in a particular year, given that there is at least one claim paid in the year. [2]

[Total 7]

9 The total claim amount on a portfolio, S, is modelled as having a compound distribution

S = X1 + X2 + + XN

where N is the number of claims and has a Poisson distribution with mean , Xi is the amount of the ith claim, and the Xi s are independent and identically distributed and independent of N. Let MX(t) denote the moment generating function of Xi.

(i) Show, using a conditional expectation argument, that the cumulant generating function of S, CS(t), is given by

CS(t) = MX(t) 1}.

Note: You may quote the moment generating function of a Poisson random variable from the book of Formulae and Tables. [4]

(ii) Calculate the variance of S in the case where = 20 and X has mean 20 and variance 10. [2]

[Total 6]

Page 41: ct32005-2010

CT3 A2006 5 PLEASE TURN OVER

10 A marketing consultant was commissioned to conduct a questionnaire survey of the clients of a financial company. The total number of respondents was 650, of whom 220 had investments above a specified threshold.

(i) Each respondent who had investments above the threshold was asked about the percentage of these investments that was held in the form of a certain type of trust. The respondents answered by ticking appropriate boxes and the results led to the following frequency distribution.

percentage < 10 10 25 25 50 > 50 frequency 22 76 73 49

(a) Present these data graphically using a carefully drawn histogram.

(b) Calculate the mean percentage for the full set of 220 such respondents, assuming that the frequencies in each category are uniformly spread over the corresponding range. [5]

(ii) Calculate a 95% confidence interval for the percentage of such investors who would have investments above the threshold. [4]

The same respondents with investments referred to in part (i) were also asked to specify their satisfaction with the current return received from their full portfolio of investment. This was in the form of a four-point qualitative scale: very satisfied, quite satisfied, a little disappointed, very disappointed. The following two-way table of frequencies was obtained.

percentage in type of trust <10 10 25 25 50 >50

very satisfied 1 6 7 6 quite satisfied 8 29 36 27 a little disappointed 10 37 28 15 very disappointed 3 4 2 1

In order to investigate whether there is any relationship between the percentage in such trusts and satisfaction with current return, a 2 test is to be performed.

(iii) Calculate the expected frequencies for the above table under an appropriate hypothesis (which should be stated) and comment on why it would be inappropriate to carry out a 2 test directly with these data. [3]

(iv) Combining the very satisfied and quite satisfied categories together and the a little disappointed and very disappointed categories together results in the following reduced two-way table.

percentage in type of trust <10 10 25 25 50 >50

satisfied 9 35 43 33 disappointed 13 41 30 16

Perform the required 2 test at the 5% level using this reduced table and comment on your conclusion. [7]

[Total 19]

Page 42: ct32005-2010

CT3 A2006 6

11 An actuary has been advised to use the following positively-skewed claim size distribution as a model for a particular type of claim, with claim sizes measured in units of £100,

2

3( ; ) exp : 0 , 0

2

x xf x x

with moments given by E[X] = 3 , E[X2] = 12 2 and E[X3] = 60 3.

(i) Determine the variance of this distribution and calculate the coefficient of skewness. [4]

(ii) Let X1, X2, , Xn be a random sample of n claim sizes for such claims.

Show that the maximum likelihood estimator (MLE) of is given by 3

X

and show that it is unbiased for . [5]

(iii) A sample of n = 50 claim sizes yields 313.6ix and 2 2,675.68ix .

(a) Calculate the MLE .

(b) Calculate the sample variance and comment briefly on its comparison with the variance of the distribution evaluated at .

(c) Given that the sample coefficient of skewness is 1.149, comment briefly on its comparison with the coefficient of skewness of the distribution.

[4]

(iv) (a) Write down a large-sample approximate 95% confidence interval for the mean of the distribution in terms of the sample mean x and the sample variance s2. Hence obtain an approximate 95% confidence interval for and evaluate this for the data in part (iii) above.

(b) Evaluate the variance of the distribution at both the lower and upper limits of this confidence interval and comment briefly with reference to your answer in part (iii)(b) above.

[5] [Total 18]

12 In an experiment to compare the effects of vaccines of differing strengths intended to give protection to children against a particular condition, twelve batches of vaccine were tested in twelve equal-sized groups of children. The percentages of children who subsequently remained healthy after exposure to the condition, named the PRH values, were recorded. The strength of each batch of vaccine was measured by an independent test and recorded as the SV value.

Page 43: ct32005-2010

CT3 A2006 7

The recorded values are:

Batch: 1 2 3 4 5 6 7 8 9 10 11 12 PRH (y): 16 68 23 35 42 41 46 48 52 50 54 53 SV (x): 0.9 1.6 2.3 2.7 3.0 3.3 3.7 3.8 4.1 4.2 4.3 4.5

2 238.4 ; 528 ; 137.16 ; 25, 428 ; 1,778.4x y x y xy

(i) Draw a rough plot of the data to show the relationship between the SV and

PRH values. [2]

It is evident that one of the observations is out of line and so may have an undue effect on any regression analysis. You are asked to investigate this as follows.

(ii) (a) Calculate the total, regression, and error sums of squares for a least-squares linear regression analysis for predicting PRH values from SV values using all 12 data observations.

(b) Determine the coefficient of determination R2.

(c) Determine the equation of the fitted regression line.

(d) Examine whether or not there is evidence, at the 5% level of testing, to enable one to conclude that the slope of the underlying regression equation is non-zero.

[11]

The details of the regression analysis after removing the data for batch 2 are given in the box below.

(iii) (a) Comment on the main differences in the results of the regression analysis resulting from removing the data for batch 2.

(b) Calculate a 95% confidence interval for the expected (mean) PRH value for a batch of vaccine with SV value 3.5.

[9] [Total 22]

END OF PAPER

Regression equation: y = 3.76 + 11.4 x Coef Stdev t-ratio p-val

Intercept 3.757 3.092 1.22 0.255 x 11.377 0.8838 12.9 0.000

Analysis of Variance Source df SS MS F p-val

Regression 1 1486.9 1486.9 165.69 0.000 Error 9 80.8 8.98 Total 10 1567.6

Page 44: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

April 2006

Subject CT3

Probability and Mathematical Statistics Core Technical

EXAMINERS REPORT

Introduction

The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable.

M Flaherty Chairman of the Board of Examiners

June 2006

Faculty of Actuaries Institute of Actuaries

Page 45: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 2

Comments

Comments on answers presented by candidates are given below. Note that in some cases variations on the solutions given are possible the examiners gave credit for all sensible comments and correct solutions.

Question 6

This question was the worst one on the paper as regards quality of answers. The question linked the concept of a conditional distribution with the simulation of observations of normal random variables (Core Reading Unit 6, section 1.3 and Unit 4, section 5.2). There were few good answers. Many candidates simply did not submit answers, suggesting that they were not familiar with the basic approach to the simulation of observations despite the fact that there were short questions on this topic in both immediately previous papers, for which solutions are readily available. Question 7

The likelihood function in this question is not of standard form and expressing and graphing it correctly requires a good understanding of the likelihood concept. Many candidates did not think clearly about the range of values of for which the likelihood is positive and for which it is zero and so got the wrong graph.

Question 8

Many candidates ignored the fact that only the first three claims in any one year are paid . Suppose Y denotes the numbers of claims which arise, then Y ~ Poisson(0.8). Suppose X denotes the number of claims which are paid. Many candidates worked with the set of probabilities P(X = i) = P(Y = i), i = 0,1,2,3.

But these four probabilities do not sum to 1 and so do not provide a proper probability distribution for X. What is required is the set of four probabilities P(X = i) = P(Y = i), i = 0,1,2 together with P(X = 3) = P(Y 3).

Question 9

The wording of the question made it clear that candidates could assume the mgf of a Poisson random variable and, armed with this information, should use a conditional expectation argument . Full marks were not awarded to candidates who jumped into the middle of the argument by assuming the mgf of a compound

Poisson random variable.

Question 10

Candidates should be aware that when constructing a histogram with unequal group widths one must ensure that the areas (and not the heights) of the rectangles are proportional to the frequencies.

In part (ii), many candidates calculated a confidence interval for a different proportion to the one asked for.

Page 46: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 3

Question 11

Many candidates were unsure of the definition of the coefficient of skewness (Core Reading Unit 3, section 3.4).

Question 12

In part (ii)(a), many candidates calculated , , andyy xx xyS S S but did not go on to

calculate the regression and error sums of squares (SSREG, SSRES) as asked for. In part (iii)(a) many candidates failed to make any of the most pertinent possible comments (but credit was given for relevant comments other than those given here).

Page 47: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 4

1 n = 40, so the median is the 20.5th observation, which is ½(7.7+7.8) = 7.75. This represents 77,500.

2 If X is the number in the sample with group A, then X has a binomial (300, 0.45) distribution, so

E[X] = 300 0.45 = 135 and Var[X] = 300 0.45 0.55 = 74.25.

Then, using the continuity correction,

P(X > 115) = P(X > 115.5)115.5 135

174.25

= 1

( 2.26) = (2.26) = 0.99.

3 2

212

1~ n

n S so here 2 2

99 ~S

2 291 9P S P

1 0.5627 0.437 (tables p165)

4 1(276.7 254.6 296.3) 275.87

3

1 276.7 275.87 0.83

2 254.6 275.87 21.27

3 296.3 275.87 20.43

2 15508.6574.4

27 27RSS

5 (i) Let X = number of policies with claims So X ~ binomial(25, p). Poisson approximation is X Poisson(25p).

(a) using Poisson(2.5) P(X 4) = 0.89118 from tables [or evaluation]

(b) using Poisson(5) P(X 4) = 0.44049 again from tables

Page 48: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 5

(ii) in (a) error is 0.8912 0.9020 = 0.0108 in (b) error is 0.4405 0.4207 = 0.0198 The approximation is valid for small p , and, as p is smaller in (a), this gives a better approximation as noted with the smaller error.

[Note: candidates may also comment on the fact that the sample size 25 is not large and so we would not expect the Poisson approximations to be very

good anyway. In fact the key to the approximations is the small p and here the given approximations are quite good]

6 Solving P(Z < z) = 0.5714

z = 0.180

x = 200 + 10(0.180) = 201.80 Solving P(Z < z) = 0.8238

z = 0.930

y = 201.80 + 0.930 = 202.73

t = 201.80 202.73 = 40911 Solving P(Z < z) = 0.3192

z = 0.470

x = 200 + 10( 0.470) = 195.3 Solving P(Z < z) = 0.6844

z = 0.480

y = 195.3 + 0.480 = 195.78

t = 195.3 195.78 = 38236

7 (i) 1 1

( )2

n n

L c for < xi <

, i = 1, 2, , n and L( ) = 0 otherwise

So, as increases from zero, L( ) is zero until it reaches the largest observation in absolute value i.e. max |xi|, i = 1, 2, , n. For the data given, this value is 0.92.

It is then a decreasing function . Hence the graph is as below:

0 0.92

L(th

eta)

0

(ii) The maximum value of L( ) is attained at the largest absolute value of the data. The ML estimate of is 0.92.

Page 49: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 6

8 (i) By subtraction using entries in tables for Poisson(0.8), the probabilities for the Poisson distribution for 0, 1, 2 and 3 are: [or by evaluation]

0.44933, 0.35946, 0.14379 and (1 0.95258) = 0.04742

(ii) Let N = number of claims paid and let X1, , Xn be the claim amounts then

S = Xi is the sum of the amounts.

E[S] = E[N]E[X]

Here E[N] = 1(0.35946) + 2(0.14379) + 3(0.04742) = 0.7893

and E[X] = 2/1 = 2 from gamma(2,1)

So E[S] = (0.7893)(2) = 1.5786 = £157.86

(iii) Given that N > 0, divide the probabilities in part (i) by (1 0.44933) = 0.55067 to give the probabilities for 1, 2 and 3 claims paid as:

0.6528, 0.2611 and 0.0861

E[N] = 1(0.6528) + 2(0.2611) + 3(0.0861) = 1.4333

So E[S] = (1.4333)(2) = 2.8666 = £286.66

9 (i) MS(t) = E[etS] = E[E[etS|N]]

Now E[etS|N = n] = E[exp(tX1 + + tXn)] =

E[exp(tXi)] = {MX(t)}n

MS(t) = E[{MX(t)}N] = E[exp{NlogMX(t)}] = MN{logMX(t)}

= exp[ MX(t) 1}] since N ~ Poisson( )

CS(t) = logMS(t) = MX(t) 1}

(ii) V[S] = CS (0) = MX

(0)} = E[X2] = 20(10 + 202) = 8200

OR V[S] = E[N]V[X] + V[N]{E[X]}2 = 20 10 + 20 202 = 8200

Page 50: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 7

10 (i) (a) The key feature of the histogram is that the areas of the four rectangles should be proportional to the frequencies.

See histogram below.

(b) Mean is calculated from the following frequency distribution:

x

5 17.5 37.5 75 f

22 76 73 49

f = 220, fx = 7852.5 7852.5

35.7%220

x

(ii) Estimated proportion is 220

0.338650

p (or 33.8%)

95% confidence interval for underlying proportion is

(1 )1.96

650

p pp

0.338(0.662)0.338 1.96 0.338 0.036

650

as a percentage: 33.8% 3.6% or (30.2%, 37.4%)

Page 51: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 8

(iii) Under the null hypothesis of no association between percentage in type of trust and satisfaction with current return, expected frequencies are

2.0 6.9 6.6 4.5 20

10.0 34.5 33.2 22.3 100

9.0 31.1 29.9 20.0 90

1.0 3.5 3.3 2.2 10

22 76 73 49 220

six are less than 5 which would invalidate a 2 test

(iv) expected frequencies (e) are

12.000 41.455 39.818 26.727

10.000 34.545 33.182 22.273

table of residuals (o e) is

3.000 6.455 +3.182 +6.273

+3.000 +6.455 3.182 6.273

table of contributions to 2 is

0.750 1.005 0.254 1.472

0.900 1.206 0.305 1.767

giving 2 = 7.659 on 3 d.f.

23 (5%) 7.815 must accept the null hypothesis that there is no

relationship between percentage in type of trust and satisfaction with current return.

However this decision to accept is marginal at the 5% level and there is some evidence, but not strong, to suggest that satisfaction improves as the percentage increases.

Page 52: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 9

11 (i) 2 = E[X 2 ]

[E(X)]2 = 12 2 (3 )2 = 3 2

= E[X 3] 3 E[X2] + 2 3

= 60 3 3(3 )(12 2) + 2(3 )3

= (60 108 + 54) 3 = 6

3

coefficient of skewness = 3

33 2 3

61.155

( 3 )

[OR: note that X ~ gamma(3,1/ ) and use formulae in tables

so var = 3 2 and coef. of skew. = 2

3 ]

(ii) 2

31

( ) exp( )2

ni i

i

x xL

2

3exp

2i i

n n

x x

2log ( ) log( ) log 2 3 log ii

xL x n n

2

3log ( ) ixn

L

equate to zero: 2

3

3i ix xn

n

this clearly maximises L( ) [or consider 2

2log ( )L ]

So MLE is 3 3

iX X

n

1 1 13

3 3 3E E X E X unbiased

(iii) (a) 313.6 6.272

6.272 2.09150 3

x

(b) 2

2 1 313.6(2675.68 ) 14.465

49 50s

2 23 and 23 13.117

s2 is a bit larger but still quite close

Page 53: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 10

(c) sample coefficient 1.149 is very close to the distribution value 1.155

(iv) (a) approximate 95% CI for is 2

1.96s

xn

as = 3 , divide by 3 for an approximate 95% CI for

211.96

3

sx

n

for data: 1 14.465

6.272 1.963 50

2.091 0.351 or (1.740, 2.442)

(b) 2 = 3 2 = 9.083 at lower limit of 1.740 = 17.890 at upper limit of 2.442

s2 = 14.465 is well within these values

confirming that s 2 is quite close to 23 .

12 (i)

(ii) (a) SSTOT = Syy = 25428 5282/12 = 2196

Sxx = 137.16 38.42/12 = 14.28 , Sxy = 1778.4 (38.4 528)/12 = 88.8

SSREG = 88.82/14.28 = 552.20, SSRES = 2196 552.20 = 1643.80

4321

70

60

50

40

30

20

SV

PR

H

Page 54: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) April 2006

Examiners Report

Page 11

(b) R2 = 552.2/2196 = 0.251 (25.1%)

(c) y = a + bx: 88.8 /14.28 6.2185b

528 /12 88.8 /14.28 (38.4 /12) 24.101a

Fitted line is y = 24.101 + 6.2185x

(d) 1/ 2

1643.8 /10. . 3.3928

14.28s e b

Observed t = (6.2185 0)/3.3928 = 1.833 < t10(0.025) = 2.228 so we do not have evidence at the 5% level of testing to justify rejecting b = 0 and concluding that the underlying slope is non-zero.

(iii) (a) Large change in slope (and intercept) of fitted line.

The total and error sums of squares are much reduced.

The fit of the linear regression model is much improved (R2 is much increased

from 25.1% to 94.9%).

We have overwhelming evidence to justify concluding that the slope is non-zero.

(b) Fitted PRH value at SV = 3.5 is 3.757 + (11.377 3.5) = 43.577

2 211, 38.4 1.6 36.8, 137.16 1.6 134.6n x x

Sxx = 11.4873

s.e. of estimation =

1/ 22

3.5 36.8 /1118.98 0.9138

11 11.4873

t9(0.025) = 2.262

95% CI for expected PRH is 43.577 (2.262 0.9138)

i.e. 43.577 2.067 or (41.51, 45.64)

END OF EXAMINERS REPORT

Page 55: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

7 September 2006 (am)

Subject CT3

Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the supervisor.

3. Mark allocations are shown in brackets.

4. Attempt all 12 questions, beginning your answer to each question on a separate sheet.

5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

Faculty of Actuaries CT3 S2006 Institute of Actuaries

Page 56: ct32005-2010

CT3 S2006 2

1 A bag contains 8 black and 6 white balls. Two balls are drawn out at random, one after the other and without replacement.

Calculate the probabilities that:

(i) The second ball drawn out is black. [1]

(ii) The first ball drawn out was white, given that the second ball drawn out is black. [2]

[Total 3]

2 Let A and B denote independent events.

Show that A and B , the complement of event B, are also independent events. [3]

3 Consider 12 independent insurance policies, numbered 1, 2, 3, , 12, for each of which a maximum of 1 claim can occur. For each policy, the probability of a claim occurring is 0.1.

Find the probability that no claims arise on the group of policies numbered 1, 2, 3, 4, 5 and 6, and exactly 1 claim arises in total on the group of policies numbered 7, 8, 9, 10, 11, and 12. [3]

4 In a large portfolio 65% of the policies have been in force for more than five years. An investigation considers a random sample of 500 policies from the portfolio.

Calculate an approximate value for the probability that fewer than 300 of the policies in the sample have been in force for more than five years. [3]

5 In a random sample of 200 policies from a company s private motor business, there are 68 female policyholders and 132 male policyholders.

Let the proportion of policyholders who are female in the corresponding population of all policyholders be denoted .

Test the hypotheses

H0: 0.4 v H1: < 0.4

stating clearly the approximate probability value of the observed statistic and your conclusion. [4]

Page 57: ct32005-2010

CT3 S2006 3 PLEASE TURN OVER

6 It is assumed that claims arising on an industrial policy can be modelled as a Poisson process at a rate of 0.5 per year.

(i) Determine the probability that no claims arise in a single year. [1]

(ii) Determine the probability that, in three consecutive years, there is one or more claims in one of the years and no claims in each of the other two years. [2]

(iii) Suppose a claim has just occurred. Determine the probability that more than two years will elapse before the next claim occurs. [2]

[Total 5]

7 A commuter catches a bus each morning for 100 days. The buses arrive at the stop according to a Poisson process, at an average rate of one per 15 minutes, so if Xi is the

waiting time on day i, then Xi has an exponential distribution with parameter 115 so

E[Xi] = 15, Var[Xi] = 152 = 225.

(i) Calculate (approximately) the probability that the total time the commuter spends waiting for buses over the 100 days exceeds 27 hours. [3]

(ii) At the end of the 100 days the bus frequency is increased, so that buses arrive at one per 10 minutes on average (still behaving as a Poisson process). The commuter then catches a bus each day for a further 99 days. Calculate (approximately) the probability that the total time spent waiting over the whole 199 days exceeds 40 hours. [3]

[Total 6]

Page 58: ct32005-2010

CT3 S2006 4

8 Let 1X denote the mean of a random sample of size n from a normal population with

mean and variance 21 , and let 2X denote the mean of a random sample also of

size n from a normal population with the same mean but with variance 22. The

two samples are independent.

Define W as the weighted average of the sample means

1 2(1 )W X X

(i) Show that W is an unbiased estimator of . [1]

(ii) Obtain an expression for the mean square error of W. [2]

(iii) Show that the value of for which W has minimum mean square error is given by

22

2 21 2

,

and verify that the optimum corresponds to a minimum. [3]

(iv) Consider the special case when the variances of the two random samples are

equal to a common value 2 . State (do not derive) the maximum likelihood estimator of calculated from the combined samples, and compare it with the estimator obtained in (iii). [2]

[Total 8]

9 (i) Show that for continuous random variables X and Y:

E[Y] = E[E(Y|X)]. [3]

(ii) Suppose that a random variable X has a standard normal distribution, and the conditional distribution of a Poisson random variable Y, given the value of X = x, has expectation g(x) = x2 + 1.

Determine E[Y] and Var[Y]. [5] [Total 8]

Page 59: ct32005-2010

CT3 S2006 5 PLEASE TURN OVER

10 Let (X1, X2, , Xn) be a random sample of a gamma(4.5, ) random variable, with

sample mean X .

(i) (a) Using moment generating functions, show that 292 ~ .nnX

(b) Construct a 95% confidence interval for , based on X and the result

in (i)(a) above.

(c) Evaluate the interval in (i)(b) above in the case for which a random sample of 10 observations gave a value 21.47ix [9]

(ii) (a) Show that the maximum likelihood estimator of is given by

4.5

X.

(b) Show that the asymptotic standard error of is estimated by

1/ 24.5n

.

(c) Construct a 95% confidence interval for based on the asymptotic

distribution of , and evaluate this interval in the case for which a random sample of 100 observations gave a value 225.3.ix

[9] [Total 18]

Page 60: ct32005-2010

CT3 S2006 6

11 A survey of financial institutions which offered tax-efficient savings accounts was conducted. These accounts had limits on the amounts of money that could be invested each year. The study was interested in comparing the maturity values achieved by investing the maximum possible amount each year over a certain time period.

The following values (in units of £1,000 and rounded to 2 decimal places) are the maturity values for such investments for 8 high street banks and 12 other banks (i.e. those without high street branches).

High street banks (x1): 11.91, 11.87, 11.83, 11.66, 11.53, 11.49, 11.49, 11.42

( 1x = 93.20, 21x = 1,086.0470)

Other banks (x2): 12.23, 12.17, 12.16, 11.90, 11.82, 11.77, 11.74, 11.70, 11.64, 11.60, 11.55, 11.50

( 2x = 141.78, 22x = 1,675.8224)

(i) Draw a diagram in which the maturity values for high street and other banks may be compared. [2]

(ii) Calculate a 95% confidence interval for the difference between the means of the maturity values for high street and other banks, and comment on any implications suggested by the interval. [6]

(iii) (a) Show that a test of the equality of variances of maturity values for high street banks and other banks is not significant at the 5% level.

(b) Comment briefly on the validity of the assumptions required for the interval in (ii). [4]

The following values (in units of £1,000 and rounded to 2 decimal places) are the maturity values for the maximum possible investment for a random sample of 12 building societies (a different kind of financial institution).

Building societies (x3): 12.40, 12.19, 12.06, 12.01, 12.00, 11.97, 11.94, 11.92, 11.88, 11.86, 11.81, 11.79

( 3x = 143.83, 23x = 1,724.2449)

(iv) Add further points to your diagram in part (i) such that the maturity values for all three types of financial institution may be compared. [1]

(v) Use one-way analysis of variance to compare the maturity values for the three different types of financial institution, and comment briefly on the validity of the assumptions required for analysis of variance. [6]

(vi) Interpret the results of the statistical analyses conducted in (ii) and (v). [2] [Total 21]

Page 61: ct32005-2010

CT3 S2006 7

12 A development engineer examined the relationship between the speed a vehicle is travelling (in miles per hour, mph), and the stopping distance (in metres, m) for a new braking system fitted to the vehicle. The following data were obtained in a series of independent tests conducted on a particular type of vehicle under identical road conditions.

Speed of vehicle (x): 10 20 30 40 50 60 70 Stopping distance (y): 5 10 23 34 40 54 75

x = 280 y = 241 2x = 14,000 2y = 11,951 xy = 12,790

(i) Construct a scatterplot of the data, and comment on whether a linear regression is appropriate to model the relationship between the stopping distance and speed. [4]

(ii) Calculate the equation of the least-squares fitted regression line. [5]

(iii) Calculate a 95% confidence interval for the slope of the underlying regression line, and use this confidence interval to test the hypothesis that the slope of the underlying regression line is equal to 1. [5]

(iv) Use the fitted line obtained in part (ii) to calculate estimates of the stopping distance for a vehicle travelling at 50 mph and for a vehicle travelling at 100 mph.

Comment briefly on the reliability of these estimates. [4] [Total 18]

END OF PAPER

Page 62: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

September 2006

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. M A Stocker Chairman of the Board of Examiners November 2006

© Faculty of Actuaries © Institute of Actuaries

Page 63: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 2

Comments

Comments on answers presented by candidates are given below. Note that in some cases variations on the solutions given are possible — the examiners gave credit for all sensible comments and correct solutions. The most common problems noted by the examiners are summarised below. Some candidates were unsure of basic concepts in probability (such as the independence of two events) and gave poor answers to Questions 2 and 3. Question 5 Many candidates used ˆ 0.34π = (wrongly) rather than 0.4π = (correctly) in the expression for the standard error of the estimate (the sample proportion) under H0. However, it makes little difference numerically, and the examiners were generous on this point when marking. Question 7 was poorly attempted, with many candidates failing to realise that the distribution of the total waiting time can be approximated by a normal distribution, by virtue of the central limit theorem. Question 8 Some candidates did not know the result on the variance of the mean of a random

sample of size n, namely2

Var Xnσ

⎡ ⎤ =⎣ ⎦ .

Question 9 Some candidates displayed a lack of familiarity with the use of conditional expectations, and in particular with the application of the result

Var[Y] = Var[E(Y|X)] + E[Var(Y|X)] Question 10 Some candidates did not know that the asymptotic standard error of a maximum likelihood estimator is found from evaluating 1/ I , where

2

2

dI Edλ

⎡ ⎤= −⎢ ⎥

⎣ ⎦ and ( )λ is the log-likelihood.

Page 64: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 3

Question 11

In the part on equality of variances (part (iii)(a)) some candidates who worked with 2122

ss

(= 0.607) did not know how to find the lower 2.5% point of F7,11 (which is the reciprocal of the upper 2.5% point of F11,7 , and is approximately 1/4.71 = 0.212).

Page 65: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 4

1 (i) P(second ball drawn is B) = P(first ball drawn is B) = 8/14 = 0.571 OR P(1st B and 2nd B) + P(1st W and 2nd B) = (8/14)×(7/13) + (6/14)×(8/13) = 8/14 (ii) P(1st W | 2nd B) = P(1st W and 2nd B)/P(2nd B) = (6/14)×(8/13)/(8/14) = 6/13 = 0.462 2 Since A and B are independent, ( ) ( ) ( )| |P A P A B P A B= = Noting that ( )B B= , it follows immediately that ( ) ( ) ( )( )| |P A P A B P A B= =

and so A and B are independent. [OR Since A and B are independent P (A ∩ B) = P(A)P(B). Thus, P(A ∩B ) = P(A) − P(A ∩ B) = P(A) − P(A)P(B) = P(A){1 − P(B)} = P(A)P( B ) ∴ A and B are independent. 3 P(no claims on 6 policies) = 0.5314 (from tables p186 — or using 0.96) P(1 claim on 6 policies) = 0.8857 − 0.5314 = 0.3543 (or using 6(0.1)(0.95)) So required probability = 0.5314 × 0.3543 = 0.188. 4 Let X be the number in force for more than five years then X ~ binomial(500,0.65) Using a normal approximation, X ≈ N(325, 10.6652) P(X < 300) becomes P(X < 299.5) using continuity correction

299.5 325( )10.665

P Z −< where Z ~ N(0,1)

( 2.39) 1 0.99158 0.0084P Z= < − = − =

Page 66: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 5

5 Under H0: sample proportion P is approximately normally distributed with mean 0.4 and standard error (0.4×0.6/200)1/2 = 0.03464

∴ P-value of observed proportion (68/200 = 0.34)

( )0.34 0.4 1.732 0.0420.03464

P Z P Z−⎛ ⎞= < = < − =⎜ ⎟⎝ ⎠

We reject H0 at the 5% level of testing and conclude that the proportion of

policyholders who are female is less than 0.4. [OR This is actually better - working with the number of female policyholders (observed = 68), the P-value is

68.5 80 1.660 0.048200(0.4)(0.6)

P Z⎛ ⎞−

< = − =⎜ ⎟⎜ ⎟⎝ ⎠

]

Note: We can word the conclusion: we reject H0 at levels of testing down to 4.2% (or

4.8%) and conclude … 6 (i) P(no claims) = P(X = 0) where X ~ Poisson(0.5) = 0.60653 from tables [or evaluation] (ii) Let Y = number of years with a claim then Y ~ binomial(3,0.3935) [or just directly as below] P(Y = 1) = 3(0.3935)(0.6065)2 = 0.434 (iii) Let T = time until next claim then T ~ exp(0.5) P(T > 2) = e–0.5(2) [or by integration] = e–1 = 0.368 [OR: answer = {P(no claim)}2 = 0.606532 = 0.368] [OR: claim rate for period of 2 years = 1, so P(no claim in 2 years) = e-1 = 0.368] 7 (i) As stated in the question, if Xi is the waiting time on day i, then Xi has an

exponential distribution with parameter 115 so E(Xi) = 15, Var(Xi) = 152 = 225.

If X is the total waiting time over the 100 days, 100

1 iiX X=

=∑ ,

Page 67: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 6

so E[X] = 1500 and Var[X] = 22500 and by the CLT X has approximately an N(1500, 22500) distribution,

so P(X > 1620) ≈ 1620 15001150−⎛ ⎞−Φ⎜ ⎟

⎝ ⎠= 1 − Φ(0.8) = 0.2119.

(ii) If Yj is the waiting time on day j of the extra 99 days, then E(Yj) = 10 and

Var(Yj) = 100 so that if Y = 991 jjY

=∑ is the total waiting time over the 99 days,

then Y is approximately N(990,9900) by CLT. If Z = X + Y (so that Z is the total waiting time over the whole 199 days), then

since X and Y are independent, Z is approximately N(1500+990, 22500+9900), i.e. N(2490, 32400).

Hence P(Z > 2400) ≈ 2400 24901180−⎛ ⎞−Φ⎜ ⎟

⎝ ⎠= 1 − Φ(−0.5) = Φ(0.5) = 0.6915.

8 (i) 1 2( ) ( (1 ) )E W E X X= α + −α 1 2( ) (1 ) ( )E X E X= α + −α (1 )= αμ + −α μ = μ Therefore W is unbiased. (ii) MSE(W) = var(W) + {bias(W)}2 W is unbiased ∴ MSE(W) = var(W) 1 2var( (1 ) )X X= α + −α 2 2

1 2var( ) (1 ) var( )X X= α + −α (independent samples)

2 2

2 21 2(1 )n nσ σ

= α + −α

(iii) 2 21 2MSE 2 2(1 )d

d n nσ σ

= α − −αα

2 2 21 2 2

MSE 0 ( )dd

= ⇒ σ +σ α = σα

Page 68: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 7

22

2 21 2

σ∴ α =

σ +σ

2 221 2

2MSE 2 2 0 minimumd

n ndσ σ

= + > ∴α

(iv) The maximum likelihood estimator of μ in the special case with 2 2 2

1 2σ = σ = σ is

1 2sum of observationsˆnumber of observations 2

nX nXn+

μ = =

1 21 12 2

X X= +

This is the same as W since

2 22

1 22 2 2 21 2

1 1 1 .2 2 2

W X Xσ σα = = = ⇒ = +

σ +σ σ +σ

9 (i) E[E(Y|X)] = ∫ E[Y|X = x] f(x)dx = ∫ ∫ yf(y|x)dy f(x)dx = ∫ ∫ y f(y|x) f(x)dydx but f(y|x) f(x) = f(x,y), the joint pdf of X and Y, so E[E(Y|X)] = ∫ ∫ y f(x,y)dxdy = E[Y] (ii) E[Y] = E[E(Y|X)] = E[X2 + 1]

= V[X] + {E[X]}2 + 1 = 1 + 0 + 1 = 2 Var[Y] = Var[E(Y|X)] + E[Var(Y|X)] = Var[X2 + 1] + E[X2 + 1]

= Var[X2] + E[X2] + 1 but Z = X2 is 2

1χ so has variance 2 and expectation 1 Thus Var[Y] = 2 + 1 + 1 = 4

Page 69: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 8

10 (i) (a) Mgf of Xi is (1 – t/λ)−4.5 so mgf of 1

n

ii

X=∑ is

( ) ( )4.5 4.5

11 / 1 /

nn

it t− −

=

− λ = − λ∏

Hence mgf of 1

2 2n

ii

X nX=

λ = λ∑ is ( ) ( )4.5 4.51 2 / 1 2n nt t− −− λ λ = −

This is the mgf of a χ2 variable — with 9n degrees of freedom.

(b) ( )2 0.95 0.952 2

a bP a nX b PnX nX

⎛ ⎞< λ < = ⇒ < λ < =⎜ ⎟⎝ ⎠

where a and b are such that ( ) ( )2 2

9 90.025 and 0.025.n nP a P bχ < = χ > =

so a 95% CI for λ is given by ,2 2

a bnX nX

⎛ ⎞⎜ ⎟⎝ ⎠

.

(c) 9n = 90, and from tables of χ2 with 90df we have a = 65.65, b = 118.1

CI is 65.65 118.1,2 21.47 2 21.47

⎛ ⎞⎜ ⎟× ×⎝ ⎠

= (1.53 , 2.75).

(ii) (a) ( ) ( )4.5 expn

iL xλ ∝ λ −λ∑ so ( ) ( )4.5 log constantin xλ = λ −λ +∑

4.5 / id n xd

⇒ = λ −λ ∑ Setting 0d

d=

λ ⇒ 4.5 4.5ˆ

i

nX X

λ = =∑

(b) 2

22 4.5 /d n

d− = λ

λ so

( )1/ 2

ˆˆ. .( )4.5

s en

λλ ≅

(c) 95% CI is ( ){ }ˆ ˆ1.96 . .s eλ ± × λ

In the case n = 100, Σx = 225.3,

Page 70: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 9

( )1/ 2

4.5 / 2.253ˆ ˆ4.5 / 2.253 1.9973 and . .( ) 0.0942450

s eλ = = λ ≅ =

so CI is 1.9973 ±(1.96 × 0.0942) i.e. (1.81 , 2.18). 11 (i) Maturity values for high street banks and other banks

(ii) x1: maturity value for high street bank x2: maturity value for other bank

193.20 11.650

8x = =

2141.78 11.815

12x = =

2

21

1 93.201086.0470 0.0381437 8

s⎛ ⎞

= − =⎜ ⎟⎜ ⎟⎝ ⎠

2

22

1 141.781675.8224 0.06288211 12

s⎛ ⎞

= − =⎜ ⎟⎜ ⎟⎝ ⎠

Pooled estimate of σ:

2 2

2 1 1 2 2

1 2

( 1) ( 1) 7(0.038143) 11(0.062882)2 18p

n s n ssn n

− + − += =

+ − = 0.053261

0.2308ps∴ = 95% confidence interval for μ1 − μ2 is

11.650 − 11.815 ± t18 (2½%) 1 18 12ps +

= −0.165 ± (2.101)(0.2308) 1 18 12+

11.4 11.5 11.6 11.7 11.8 11.9 12.0 12.1 12.2

Other banks

High street

Page 71: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 10

i.e. −0.165 ± 0.2213 i.e. (−0.386, 0.056) i.e. the confidence interval for the difference between the means for high street

banks and other banks (μ1 − μ2) is −£386 to £56. As zero is within the confidence interval, there is insufficient evidence, at 5%

level, to reject the null hypothesis that the mean maturity values do not differ for the accounts offered by high street banks and other banks.

(iii) (a) 2221

SS

~ F11,7

under the assumption that the variances are equal for high street and

other banks, i.e. H0: 2 2

1 2σ = σ

2221

0.062882 1.650.038143

ss

= =

We cannot reject the null hypothesis at the 5% level as the two-sided

critical value of a 5% level test is approximately 4.71 (by interpolation using 2½% one-sided F table in Yellow Book).

[OR probability value is p > 0.20 as a two-sided 20% level test has a

critical value of approximately 2.69.] (b) The plot in (i) indicates that the assumption of a normal distribution for

maturity values is reasonable (but small samples) for both high street and other banks. The assumption of equal variance also seems valid as the test in (iii)(a) is not significant (and the plot above supports this).

(iv) Adding points for building societies to previous plot in (i).

Maturity values

12.411.911.4

Building soc

Other banks

High street

Page 72: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 11

(v) Σx = 93.20 + 141.78 + 143.83 = 378.81 Σx2 = 1086.0470 + 1675.8224 + 1724.2449 = 4486.114

SST = 4486.114 − 2378.81

32 = 1.832

SSB = 2 2 2 293.20 141.78 143.83 378.81 0.551

8 12 12 32+ + − =

∴ SSR = SST − SSB = 1.832 − 0.551 = 1.281 Analysis of variance table

Source of variation df SS MSS Financial institution types 2 0.551 0.276 Residual 29 1.281 0.044 Total 31 1.832

0.276 6.270.044

F = = on (2, 29) degrees of freedom

F2,29 (5%) = 3.328 and F2,29 (1%) = 5.42 Reject H0: μ1 = μ2 = μ3 (population means are equal) at 1% level. Strong evidence of differences between the 3 financial institutions. The plot shows nothing strong enough to invalidate the assumptions of

normality and equal variances, even though the variability for the building societies is a bit smaller than for the banks.

(vi) Part (ii) indicates that there are no differences between the mean maturity

values of the two types of bank, but (v) indicates that there are differences between the mean maturity values of the 3 types of financial institution. Therefore, in conclusion, it seems that the mean maturity value for building societies is not equal to the mean maturity values of the banks. Also, the plot in (iv) suggests that the maturity value for building societies is higher than the mean maturity values for the banks.

Page 73: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 12

12 (i)

There is a suggestion of a curve but linear regression might still be reasonable.

(ii) n = 7 Sxx = Σx2 − (Σx)2/n = 14000 − (280)2/7 = 2800 Syy = Σy2 − (Σy)2/n = 11951 − (241)2/7 = 3653.714 Sxy = Σxy − (Σx)( Σy)/n = 12790 − (280)(241)/7 = 3150 Model: E[Y] = α + βx

Slope: 3150ˆ 1.1252800

xy

xx

SS

β = = =

Intercept: ˆˆ y xα = −β = 241/7 − (1.125)(280/7) = −10.571 The equation of the least-squares fitted regression line is: Distance = −10.571 + 1.125 Speed

10 20 30 40 50 60 70

0

10

20

30

40

50

60

70

80

Speed

Sto

ppin

g di

stan

ce

Plot of stopping distance against speed

Page 74: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2006 — Examiners’ Report

Page 13

(iii) 2

2 1ˆ2

xyyy

xx

SS

n S

⎛ ⎞⎜ ⎟σ = −⎜ ⎟− ⎝ ⎠

21 (3150)3653.7145 2800⎛ ⎞

= −⎜ ⎟⎜ ⎟⎝ ⎠

= 21.99

s.e.( β ) = 2ˆ 21.99 0.0886

2800xxSσ

= =

95% confidence interval for slope: 2

ˆ (0.025)nt −β± s.e.( β ) (df = n − 2 = 5) = 1.125 ± (2.571)(0.0886) = 1.125 ± 0.228 or (0.897, 1.353) β = 1 is within this 95% confidence interval, therefore we would not reject the

null hypothesis β = 1 at the 5% significance level. (iv) When x = 50: y = −10.571 + 1.125(50) = 45.7 m When x = 100: y = −10.571 + 1.125(100) = 101.9 m The stopping distance of 45.7 m when the speed is 50 mph can be regarded as

a reliable estimate as x = 50 is well within the range of the x data values. However, the stopping distance for a speed of 100 mph may be unreliable as

x = 100 is outside the range of the data and involves extrapolation.

END OF EXAMINERS’ REPORT

Page 75: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

23 April 2007 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 13 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

© Faculty of Actuaries CT3 A2007 © Institute of Actuaries

Page 76: ct32005-2010

CT3 A2007—2

1 Consider the following two random samples of ten observations which come from the distributions of random variables which assume non-negative integer values only. Sample 1: 7 4 6 11 5 9 8 3 5 5 sample mean = 6.3, sample variance = 6.01 Sample 2: 8 3 5 11 2 4 6 12 3 9 sample mean = 6.3, sample variance = 12.46 One sample comes from a Poisson distribution, the other does not. State, with brief reasons, which sample you think is likely to be which. [2]

2 A random sample of 200 policy surrender values (in units of £1,000) yields a mean of 43.6 and a standard deviation of 82.2.

Determine a 99% confidence interval for the true underlying mean surrender value for

such policies. [3] 3 It is assumed that claims on a certain type of policy arise as a Poisson process with

claim rate λ per year. For a group of 150 independent policies of this type, the total number of claims during

the last calendar year was recorded as 123. Determine an approximate 95% confidence interval for the true underlying annual

claim rate for such a policy. [4] 4 The sample correlation coefficient for the set of data consisting of the three pairs of

values (−1,−2) , (0,0) , (1,1) is 0.982. After the x and y values have been transformed by particular linear functions,

the data become: (2,2) , (6,−4) , (10,−7). State (or calculate) the correlation coefficient for the transformed data. [2]

Page 77: ct32005-2010

CT3 A2007—3 PLEASE TURN OVER

5 The number of claims arising in one year from a group of policies follows a Poisson distribution with mean 12. The claim sizes independently follow an exponential distribution with mean £80 and they are independent of the number of claims.

The current financial year has six months remaining. Calculate the mean and the standard deviation of the total claim amount which arises

during this remaining six months. [4]

6 Consider the discrete random variable X with probability function

14( ) , 0,1, 2, ...

5xf x x+= =

(i) Show that the moment generating function of the distribution of X is given by 1( ) 4(5 )t

XM t e −= − , for et < 5. [3] (ii) Determine E[X] using the moment generating function given in part (i). [3] [Total 6] 7 A charity issues a large number of certificates each costing £10 and each being

repayable one year after issue. Of these certificates, 1% are randomly selected to receive a prize of £10 such that they are repaid as £20. The remaining 99% are repaid at their face value of £10.

(i) Show that the mean and standard deviation of the sum repaid for a single

purchased certificate are £10.1 and £0.995 respectively. [2] Consider a person who purchases 200 of these certificates. (ii) Calculate approximately the probability that this person is repaid more than

£2,040 by using the Central Limit Theorem applied to the total sum repaid. [3] (iii) An alternative approach to approximating the probability in (ii) above is based

on the number of prize certificates the person is found to hold. This number will follow a binomial distribution.

Use a Poisson approximation to this binomial distribution to approximate the

probability that this person is repaid more than £2,040. [3] (iv) Comment briefly on the comparison of the two approximations above given

that the exact probability using the binomial distribution is 0.0517. [1] [Total 9]

Page 78: ct32005-2010

CT3 A2007—4

8 A random sample of size n is taken from a distribution with probability density function

1( ) , 0(1 )

f x xx α+α

= < < ∞+

where α is a parameter such that α > 0.

(i) Show by evaluating the appropriate integral that, in the case α > 1, the mean

of this distribution is given by 11α −

.

[Hint: when integrating, write x = (1 + x) - 1 and exploit the fact that the

integral of a density function is unity over its full range.] [3] (ii) Determine the method of moments estimator of α. [2] [Total 5] 9 Consider three random variables X, Y, and Z with the same variance σ2 = 4. Suppose

that X is independent of both Y and Z, but Y and Z are correlated, with correlation coefficient ρYZ = 0.5.

(i) Calculate the covariance between X and U, where U = Y+Z. [1] (ii) Calculate the covariance between Z and V, where V = 3X – 2Y. [2] (iii) Calculate the variance of W, where W = 3X – 2Y + Z. [2] [Total 5]

10 A random sample of insurance policies of a certain type was examined for each of four insurance companies and the sums insured (yij, for companies i = 1, 2, 3, 4) under each policy are given in the table below (in units of £100):

Company

Total

1 58.2 57.2 58.4 55.8 54.9 284.5 2 56.3 54.5 57.0 55.3 223.1 3 50.1 54.2 55.5 159.8 4 52.9 49.9 50.0 51.7 204.5

For these data, 871.9iji j y =∑ ∑ and 2 47,633.53iji j y =∑ ∑

Page 79: ct32005-2010

CT3 A2007—5 PLEASE TURN OVER

Consider the ANOVA model , 1,..., 4, 1,...,ij i ij iY e i j n= μ + τ + = = , where Yij is the jth sum insured for company i, ni is the number of responses for company i,

2~ (0, )ije N σ are independent errors, and 41

0i iin

=τ =∑ .

(i) Calculate estimates of the parameters μ and , 1, 2, 3, 4i iτ = . [3] (ii) Test the hypothesis that there are no differences in the means of the sums

insured under such policies by the four companies. [5] [Total 8]

11 The number of claims, X, which arise in a year on each policy of a particular class is to be modelled as a Poisson random variable with mean λ. Let X = (X1, X2, …, Xn) be

a random sample of size n from the distribution of X, and let 1

1 n

ii

X Xn =

= ∑ .

Suppose that it is required to estimate λ, the mean number of claims on a policy. (i) Show that λ , the maximum likelihood estimator of λ, is given by ˆ = Xλ . [3] (ii) Derive the Cramer-Rao lower bound (CRlb) for the variance of unbiased

estimators of λ. [4] (iii) (a) Show that λ is unbiased for λ and that it attains the CRlb. (b) Explain clearly why, in the case that n is large, the distribution of λ can

be approximated by

ˆ ~ ,Nnλ⎛ ⎞λ λ⎜ ⎟

⎝ ⎠.

[3] (iv) (a) Show that, in the case n = 100, an approximate 95% confidence

interval for λ is given by 0.196x x± . (b) Evaluate the confidence interval in (iv)(a) based on a sample with the

following composition: observation 0 1 2 3 4 5 6 7 frequency 11 28 19 28 9 2 2 1 [6] [Total 16]

Page 80: ct32005-2010

CT3 A2007—6

12 An insurance company is investigating past data for two household claims assessors, A and B, used by the company. In particular claims resulting from similar types of water damage were extracted. The following table shows the assessors’ initial estimates of the cost (in units of £100) of meeting each claim.

A: 4.6 6.6 2.8 5.8 2.1 5.2 5.9 3.4 7.8 3.5 1.6 8.6 2.7 B: 5.7 3.4 4.7 3.6 6.5 3.3 3.8 2.4 7.0 4.0 4.4

for the A data: nA = 13, Σx = 60.6 and Σx2 = 340.92 for the B data: nB = 11, Σx = 48.8 and Σx2 = 236.80 (i) Draw a suitable diagram to represent these data so that the initial estimates of

the two assessors can be compared. [2] (ii) You are required to perform an appropriate test to compare the means of the

assessors’ initial estimates for this type of water damage. (a) State your hypotheses clearly. (b) Use your diagram in part (i) to comment briefly on the validity of your

test. (c) Calculate your test statistic and specify the resulting P-value. (d) State your conclusion clearly. [10] (iii) You are required to perform an appropriate test to compare the variances of

the assessors’ initial estimates. (a) State your hypotheses clearly. (b) Use your diagram in part (i) to comment briefly on the validity of your

test. (c) Calculate your test statistic and specify the resulting P-value. (d) State your conclusion clearly and hence comment further on the

validity of your test in part (ii). [7]

(iv) Use your answers in parts (i) to (iii) to comment on the overall comparison of

the two assessors as regards their initial estimates for this type of water damage. [1]

[Total 20]

Page 81: ct32005-2010

CT3 A2007—7

13 In a study of the relation between the amount of information available and use of buses in eight comparable test cities, bus route maps were given to residents of the cities at the beginning of the test period. The increase in average daily bus use during the test period was recorded. The numbers of maps and the increase in bus use are given in the table below (both in thousands).

Number of maps (x) 80 220 140 120 180 100 200 160 Increase in bus use (y) 0.60 6.70 5.30 4.00 6.55 2.15 6.60 5.75

For these data: 1, 200x =∑ , 2 196,800x =∑ , 37.65y =∑ , 2 213.4875y =∑ , 6,378xy =∑ (i) Construct a scatterplot of the data and comment on the relationship between

the increase in bus use and the number of maps distributed. [4] (ii) The equation of the fitted linear regression is given by 1.816 0.04348y x= − + . Perform an appropriate statistical test to assess the hypothesis that the slope in

this fitted model suggests no relationship between the increase in bus use and the number of maps distributed. Any assumptions made should be clearly stated. [6]

(iii) The fitted responses and the residuals from the linear regression model fitted

in part (ii) are given below:

Fitted values ˆ( )y 1.66 7.75 4.27 3.40 6.01 2.53 6.88 5.14Residuals ˆ( )e -1.06 -1.05 1.03 0.60 0.54 -0.38 -0.28 0.61

Plot the residuals against the values of the fitted responses and comment on

the adequacy of the model. [4] (iv) A new city is added to the study, and 250,000 maps are distributed to its

citizens. Calculate the prediction of the increase in bus use in this city according to the

model fitted in part (ii) and comment on the validity of this prediction. [2] [Total 16]

END OF PAPER

Page 82: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

April 2007

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. M A Stocker Chairman of the Board of Examiners June 2007

Comments Comments are given in the solutions that follow. Note that in some cases variations on the solutions given are possible — the examiners gave credit for all sensible comments and correct solutions. The paper was well-answered overall and there are no particular topics that stand out as being poorly attempted. Similarly there were no particular misunderstandings evident widely, and no particular errors were made repeatedly and are worthy of comment. In Question 13(iii) candidates were asked to plot a given set of residuals and comment. There was in fact a negative sign missing from the first residual (quoted as 1.6). The error was noted before marking commenced. No candidate was disadvantaged — all answers using 1.06 or -1.06 were accepted as being equally valid. There was no evidence in the scripts of any problem for candidates. The examiners wish to apologise for the minor error. © Faculty of Actuaries © Institute of Actuaries

Page 83: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 2

1 A Poisson random variable has mean = variance and this will be reflected in the sample mean and variance for a random sample.

Sample 2 has a very much higher variance than mean, whereas sample 1 has mean

and variance approximately the same, so sample 1 is likely to be the one which comes from a Poisson distribution.

2 Approximate large sample confidence interval for the mean is given by

/ 2sx znα±

for 99% CI, / 2zα = 2.5758

leading to 82.243.6 2.5758 43.6 15.0200

± ⇒ ±

or (28.6, 58.6) or (£28,600, £58,600)

3 The mean number of claims per policy is 123 0.82150

X = =

Using the normal approximation to the Poisson distribution

approximate 95% confidence interval for λ is 1.96 XXn

±

0.820.82 1.96 0.82 1.96(0.0739)150

→ ± → ±

0.82 0.145 or (0.675,0.965)→ ±

4 Answer = −0.982 (The relationship is now a negative one; the only change is the sign. An answer of

+0.982 gets 1 mark.)

Page 84: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 3

5 Let N = number of claims in the six months Let X = a single claim size Let S = sum of claim sizes for the six months Then N ~ Poisson(6) ( ) ( ) ( ) (6)(80) £480E S E N E X= = = 2 2 2( ) ( ) ( ) ( )[ ( )] (6)(80 ) (6)(80 ) 76800V S E N V X V N E X= + = + = ( ) £277sd S∴ = 6 (i) ( ) [ ]tx

XM t E e=

0 0

4 1 45 5 5 5

xx ttx

x x

ee∞ ∞

= =

⎛ ⎞⎛ ⎞= = ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠∑ ∑ ,

and for 5te < ,

( ) 14 1( ) 4 55 1 5

tX tM t e

e

−= = −

−.

(ii) ( ) 2'( ) 4 5t tM t e e

−= −

Mean is given by ( ) '(0)E X M=

( ) 20 0 1[ ] 4 54

E X e e−

∴ = − = .

[OR, by expansion as a power series.]

Page 85: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 4

7 (i) Let X = the sum repaid for a single certificate. ( ) 10(0.99) 20(0.01) 10.1E X = + = 2 2 2( ) 10 (0.99) 20 (0.01) 103E X = + = 2( ) 103 10.1 0.99 ( ) 0.9950V X sd X∴ = − = ∴ = (ii) Let S = the sum repaid for 200 certificates. ( ) 200(10.1) 2020, ( ) 200(0.99) 198 ( ) 14.07E S V S sd X∴ = = = = ∴ =

2040 2020( 2040) ( 1.42)14.07

P S P Z −> = > =

1 0.9222 0.0778= − = (iii) ~ binomial(200,0.01) Poisson(2)N ≈ ( 2040) ( 4)P S P N> = > 1 ( 4) 1 0.94735 0.0527P N= − ≤ = − = (iv) Clearly the Poisson approximation to the binomial is better than the Central

Limit Theorem approximation. OR: Since S is discrete and increases in steps of 10, one can argue for the use of a

continuity correction in (ii) above:

( ) ( )2045 20202040 1.7814.07

P S P Z P Z−⎛ ⎞> = ≥ = >⎜ ⎟⎝ ⎠

= 1 − 0.96246 = 0.0375 (Either approach is acceptable for the marks.)

Page 86: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 5

8 (i) Mean = 10 (1 )

x dxx

α+α

+∫

= 1 10 0

(1 ) 1(1 ) (1 )

x dx dxx x

∞ ∞

α+ α+α α

+ −+ +∫ ∫

= ( 1) 10

( 1) 11 (1 )

dxx

α− +α α −

−α − +∫

= 111 1

α− =

α − α −

(ii) Equate population mean to sample mean: 11

x=α −

Solve to get 11x

α = + , so MME = 11X

+

9 (i) Cov(X,Y+Z) = Cov(X,Y) + Cov(X,Z) = 0 [Note: The simple statement of the answer “0” is acceptable for the single

mark available.] (ii) Cov(Z, 3X - 2Y) = 3Cov(Z,X) – 2Cov(Z,Y) = 0 – 2ρYZ σ2 = -2 × 0.5 × 4 = -4 (iii) V[3X – 2Y + Z] = (9 + 4 + 1)σ2 – 12Cov(X,Y) + 6Cov(X,Z) – 4Cov(Y,Z) = 14(4) – 4(0.5)(4) = 48

Page 87: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 6

10 (i) ..871.9ˆ 54.494

16Yμ = = =

1 1. ..284.5 871.9ˆ 2.406

5 16Y Yτ = − = − =

2 2. ..223.1 871.9ˆ 1.281

4 16Y Yτ = − = − =

3 3. ..159.8 871.9ˆ 1.227

3 16Y Yτ = − = − = −

4 4. ..204.5 871.9ˆ 3.369

4 16Y Yτ = − = − = −

(ii) 2

2 .. 120.430T iji j

YSS yn

= − =∑∑

SSB = 2 2 2 2 2

. ..( ) (5× 2.406 ) (4 ×1.281 ) (3 1.227 ) (4 3.369 ) = 85.425i ii

n Y Y− = + + × + ×∑

2 2. ..[OR : 85.428]i

Bii

Y YSSn n

⎛ ⎞= − =⎜ ⎟⎜ ⎟

⎝ ⎠∑

35.002R T BSS SS SS= − = The ANOVA table is:

Source DF SS MS F

Company (between treatments) 3 85.428 28.476 9.763 Residual 12 35.002 2.917

Total 15 120.430 At the 5% significance level, 0.05,3,12 3.490F = (or 0.01,3,12 5.953F = ) Since F = 9.763 > 3.490, there is evidence against the null hypothesis, and we

conclude that there are differences in the mean sums insured by the companies.

Page 88: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 7

11 (i) ( )!

ixn

i

eL xx

− λ ∑λ=

⇒ ( ) ( ) ( )log log constantiL n xλ = λ = − λ + λ +∑

⇒ ˆ0i ix Xd n Xd n

= − + = ⇒ λ = =λ λ

∑ ∑

(ii) 2

2 2ixd

d= −

λ λ∑

2

2 2 21

id n nE E Xd

⎡ ⎤ λ⎡ ⎤⇒ − = = =⎢ ⎥ ⎣ ⎦ λλ λ λ⎢ ⎥⎣ ⎦∑

⇒ CRlb = .nλ

(iii) (a) [ ]ˆE E X E X⎡ ⎤ ⎡ ⎤λ = = = λ⎣ ⎦⎣ ⎦

[ ]ˆ V XV V X

n nλ⎡ ⎤ ⎡ ⎤λ = = =⎣ ⎦⎣ ⎦ ,which is CRlb.

(b) The theory of asymptotic distributions of MLEs (and in this case the

CLT) gives ˆ ~ approximately, for largeN nλ so ˆ ~ ,Nnλ⎛ ⎞λ λ⎜ ⎟

⎝ ⎠ ,

approximately. (iv) (a) Large sample approximate 95% CI for λ is given by ( )( ) ( )( )ˆ ˆ1.96 . . i.e. 1.96 . .s e x s e xλ ± × λ ± ×

( )ˆ. .s enλ

λ = which we can estimate by using x to estimate λ, giving

the estimated standard error ( )ˆ. . . xe s en

λ =

With n = 100, we get the 95% CI as

1.96 . . 0.196100

xx i e x x⎛ ⎞

± × ±⎜ ⎟⎜ ⎟⎝ ⎠

.

Page 89: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 8

(b) 215 /100 2.15x = = CI is 2.15 ± 0.196(2.15)1/2 i.e. 2.15 ± 0.287 i.e. (1.86, 2.44) 12 (i)

dotplots on same scale are most suitable [alternatively boxplots are acceptable] (ii) (a) Let μA = mean initial estimate for this type of water damage for

assessor A and μB = mean initial estimate for this type of water damage for assessor B.

H0 : μA = μB v H1 : μA ≠ μB (b) dotplots show that normality assumption is reasonably valid dotplots perhaps cast doubt on equal variances assumption

(c) test statistic is 21 1 A BA B

n n

pA B

x xt ts

n n

+ −−

=+

∼ under H0

From data: 60.6 4.66213Ax = = ,

22 1 60.6(340.92 ) 4.8692

12 13As = − =

48.8 4.43611Bx = = ,

22 1 48.8(236.80 ) 2.0305

10 11Bs = − =

and 2 12(4.8692) 10(2.0305) 3.5789 1.891822p ps s+

= = ∴ =

observed t = 4.662 4.436 0.226 0.290.7751 11.8918

13 11

−= =

+ on 22 df

Page 90: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 9

Clearly P-value is very large, or noting that t22(40%) = 0.2564, then P-value is just a bit less than 0.8.

(d) So there is no evidence at all of any difference between assessors A

and B as regards their mean initial estimates for this type of water damage.

(iii) (a) H0 : 2 2

A Bσ = σ v H1 : 2 2A Bσ ≠ σ

(b) as in (i) dotplots show that normality assumption is reasonably valid

(c) test statistic is 2

1, 12 A BA

n nB

sF Fs − −= ∼ under H0

observed F = 4.8692 2.402.0305

= on 12,10 df

F12,10(10%) = 2.284 and F12,10(5%) = 2.913 Thus P-value is between 0.10 and 0.20. (d) So there is no real evidence of any difference between assessors A and

B as regards the variances of their initial estimates for this type of water damage.

This validates the possibly doubtful assumption required in part (ii). (iv) Overall there is no real evidence to distinguish any differences in the initial

estimates for this type of water damage for the two assessors A and B.

Page 91: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 10

13 (i) The scatterplot is shown below.

80 100 120 140 160 180 200 220

12

34

56

maps

trips

The plot suggests that there is a positive relationship between the increase in

bus use and the number of maps distributed. The increase seems to be reasonably linear up to around 180000 maps, after which point it seems to level off (overall, relationship seems curved, possibly quadratic).

(ii) Sxx = Σx2 − (Σx)2/n = 196800 − (1200)2/8 = 16800 Syy = Σy2 − (Σy)2/n = 213.4875 − (37.65)2/8 = 36.29719 Sxy = Σxy − (Σx)( Σy)/n = 6378 − (1200)(37.65)/8 = 730.5

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−=

xx

xyyy S

SS

n

22

21σ

21 (730.5)36.297196 16800⎛ ⎞

= −⎜ ⎟⎜ ⎟⎝ ⎠

= 0.75558

s.e.( β ) = 2ˆ 0.75558 0.006706

16800xxSσ

= =

To test 0 : 0H β = v 1 : 0H β ≠ , the test statistic is

ˆ 0 0.04348 6.484ˆ 0.006706s.e.( ) β−

= =β

,

and under the assumption that the errors of the regression are i.i.d. 2(0, )N σ random variables, it has a t distribution with n - 2 = 6 df.

Page 92: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2007 — Examiners’ Report

Page 11

From statistical tables we find 6,0.025 2.447t = (or, 6,0.005 3.707t = ). Therefore, there is strong evidence against 0H . We conclude that a straight line representation of the relationship between the increase in bus use and the number of maps distributed would have a non-zero slope. (iii) The plot is shown below.

2 3 4 5 6 7

-1.0

-0.5

0.0

0.5

1.0

Fitted

Res

idua

ls

Negative residuals are associated with the fitted values at the two ends of the

data set, suggesting that the model is inadequate. Pattern suggests that a quadratic model might be appropriate.

(iv) Predicted value is ˆ 1.816 0.04348 250 9.054y = − + × =

This uses extrapolation on the fitted regression line. The prediction is probably not valid, especially as the linear model does not seem adequate.

END OF EXAMINERS’ REPORT

Page 93: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

4 October 2007 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 13 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator.

© Faculty of Actuaries CT3 S2007 © Institute of Actuaries

Page 94: ct32005-2010

CT3 S2007—2

1 Data collected on claim amounts (£) for two postcode regions give the following values for n (the number of claims), x (the mean claim amount) and s (the sample standard deviation) of the claim amounts.

Region 1 Region 2 n 25 18 x 120.2 142.7 s 58.1 62.2

Calculate, to one decimal place, the mean and sample standard deviation of the claim

amounts for both regions combined. [4]

2 The random variable X has probability density function

32( ) , 1f x xx

= >

and cumulative distribution function

2

0, 1( ) 11 , 1

xF x

xx

<⎧⎪= ⎨ − ≥⎪⎩

.

Use the following uniform(0,1) random numbers 0.5719, 0.8612, 0.3028 to simulate three observations of X, explaining your method and calculations clearly. [4]

3 It is known that 24% of the customers in a bank holding a current account also have another type of account with the bank.

Calculate an approximate value for the probability that fewer than 50 customers in a

random sample of 250 customers with a current account also have another type of account. [3]

4 In a random sample of 200 policies from a company’s private motor business, there

are 68 female policyholders and 132 male policyholders. Calculate an approximate 99% confidence interval for the proportion of policyholders

who are female in the corresponding population of all policyholders. [3]

Page 95: ct32005-2010

CT3 S2007—3 PLEASE TURN OVER

5 For a particular insurance company a sample of eight claim amounts (in units of

£1,000) on household contents is taken. The data give 8

156.7i

ix

=

=∑ and

82

1403.95i

ix

=

=∑ . The claim amounts are assumed to follow a normal distribution.

(i) Calculate a 90% confidence interval for the true mean claim amount. [3] (ii) Use the confidence interval calculated in (i) above to comment on an expert’s

assessment that the average claim amount for the company is £6,500. [1] [Total 4]

6 In an investigation into the relationship between two normally distributed variables, X and Y, based on a sample of 15 points, it is desired to perform the following test concerning the true underlying correlation coefficient ρ

H0 : ρ = 0 v H1 : ρ > 0. Use the t distribution to determine an upper critical value for the sample correlation

coefficient r for this test at the 1% level. [4] 7 Let N denote the number of claims which arise in a portfolio of business and let Xi be

the amount of the ith claim. Let each of the Xi’s be independently modelled as a normal variable with mean £10,000 and standard deviation £2,000 and let N be independently modelled as a Poisson variable with parameter 20.

Calculate the mean and standard deviation of the total claim amount S = X1+…+XN. [3]

8 Claim sizes in a certain insurance situation are modelled by a normal distribution with mean μ = £30,000 and standard deviation σ = £4,000 The insurer defines a claim to be a large claim if the claim size exceeds £35,000.

(i) Calculate the probabilities that the size of a claim exceeds (a) £35,000, and (b) £36,000 [2] (ii) Calculate the probability that the size of a large claim (as defined by the

insurer) exceeds £36,000. [2] (iii) Calculate the probability that a random sample of 5 claims includes 2 which

exceed £35,000 and 3 which are less than £35,000. [2] [Total 6]

Page 96: ct32005-2010

CT3 S2007—4

9 For a certain class of policies issued by a large insurance company it is believed that the probability of each policy giving rise to any claims is 0.5, independently of all other policies. A random sample of 250 such policies is selected.

(i) Determine approximately the probability that at least 139 of the policies in the

sample will each give rise to any claims. [4] (ii) Suppose we do observe that 139 policies in our sample give rise to at least one

claim. Use your answer to part (i) to determine whether this suggests at the 1% level of significance that the probability of any claims arising from a policy of this certain class is greater than initially believed. [3]

[Total 7] 10 A chi-square test of association for the frequency data in the following 2 × 3 table

Factor A A1 A2 A3

B1 40 30 50 Factor B B2 80 30 70

produces a chi-square statistic with value 4.861 and associated P-value 0.089. Consider a chi-square test of association for the data in the following 2 × 3 table, in

which all frequencies are twice the corresponding frequencies in the first table:

Factor A A1 A2 A3

B1 80 60 100 Factor B B2 160 60 140

(i) State, or calculate, the value of the chi-square test statistic for the second table. [2] (ii) Find the P-value associated with the test statistic in (i). [1] (iii) Comment on the results. [2] [Total 5]

Page 97: ct32005-2010

CT3 S2007—5 PLEASE TURN OVER

11 Suppose that the random variable X follows an exponential distribution with probability density function

( ) , 0 xf x e x−λ= λ < < ∞ ( 0)λ > .

Define a new random variable13Y X= .

(i) (a) Show that the cumulative density function of Y is given by

31 exp( ), 0( )

0, 0Yy yF y

y

⎧ − −λ ≥⎪= ⎨<⎪⎩

and hence, or otherwise, find the probability density function of Y. (b) Explain how you would simulate a value of Y given a value u from the

uniform U(0,1) distribution. [7]

(ii) (a) Find an expression for the maximum likelihood estimator of the

parameter λ, using a sample y1, y2, …, yn, from the distribution of Y. (b) Eight observed values of the random variable Y are given below:

0.72 1.15 1.26 1.03 1.69 1.30 1.42 1.15 Calculate the maximum likelihood estimate of λ using these values. [6] (iii) (a) The hazard function of a continuous random variable T is defined

as ( )( )( )

f th tS t

= , where f(t) denotes the probability density function and

S(t) denotes the survival function defined as ( ) ( )S t P T t= > . Derive the hazard functions of the random variables X and Y defined

above. (b) If a random variable T represents the lifetime of an individual, then the

hazard function h(t), as defined in part (iii)(a), gives the instantaneous mortality rate (that is, the force of mortality) at time t for that individual.

State (with reasons) which of the two random variables (X and Y) you

would use to model the lifetime of pensioners for a period of time longer than one year, basing your answer on the form of the corresponding hazard functions derived in part (iii)(a). [5]

[Total 18]

Page 98: ct32005-2010

CT3 S2007—6

12 A series of n geomagnetic readings are taken from a meter, but the readings are judged to be approximate and unreliable. The chief scientist involved does know however that the true values are all positive and she suggests that an appropriate model for the readings is that they are independent observations of a random variable which is uniformly distributed on (0, θ), where θ > 1.

(i) Suppose that the chief scientist knows only that the number, M, of the readings

which are less than 1 is m, with the remaining n − m being greater than 1 and that she adopts the model as suggested above.

(a) Show that the probability that a single reading is less than 1 is 1θ

.

(b) Demonstrate that the maximum likelihood estimate of θ is ˆ .nm

θ =

(c) Demonstrate that the Cramer-Rao lower bound (CRlb) for estimating

θ is ( )2 1n

θ θ −and hence state the large sample distribution of θ .

[10] (ii) Suppose that exactly 45 readings in a random sample of 100 readings are less

than 1.

(a) Calculate an estimate of the standard error of θ and hence calculate an approximate two-sided 95% confidence interval for θ.

(b) Use the large sample distribution of θ to test the hypotheses H0: θ = 3 v H1: θ < 3. [9] [Total 19]

Page 99: ct32005-2010

CT3 S2007—7 PLEASE TURN OVER

13 In a laboratory experiment a response variable (yield, y) is thought to be affected by a quantitative factor (percentage of catalyst, x). The experiment involved making four observations of y at each of four values of x, being 12%, 14%, 16% and 18%, and resulted in the following observed response data.

12% 14% 16% 18% 46 56 56 47 51 57 63 51 47 63 60 54 42 60 64 55

These data are analysed by two statisticians, A and B, who use an analysis of variance

approach and a linear regression approach, respectively. (i) Statistician A’s approach: You are given the following data summaries: sub-totals Σy = 186, 236, 243 and 207 at x = 12, 14, 16 and 18,

respectively, and overall totals Σy = 872 and Σy2 = 48,196. (a) Apply a one-way analysis of variance to these data and obtain the

resulting F-value for the usual test. (b) Show that the P-value for the test is substantially less than 0.01, by

referring to tables of percentage points for the F distribution. (c) The result of part (b) above shows that there is very strong evidence of

an effect on y due to the quantitative factor x. Suggest a suitable diagram that statistician A could now use to describe the effect of x on y. Draw this diagram and hence comment on the effect of x on y.

(d) The graph below shows the residuals plotted against the values of x:

Comment briefly on any implications of this graph. [10]

Page 100: ct32005-2010

CT3 S2007—8

(ii) Statistician B’s approach:

You are given the following data summaries: Σx = 240 Σy = 872 Σx2 = 3,680 Σy2 = 48,196 Σxy = 13,150. (a) Perform a linear regression analysis on these data to show that the

fitted line is given by y = 41.4 + 0.875x. (b) Perform the hypothesis test on the slope coefficient H0 : β = 0 v H1 : β ≠ 0 showing that the P-value is greater than 0.20. Comment on what this implies about the relationship between x and y. (c) The graph below shows the residuals plotted against the values of x:

Comment briefly on what this graph implies about the effect of x on y. (d) Suggest an additional analysis statistician B could now use to describe

the effect of x on y. [10] [Total 20]

END OF PAPER

Page 101: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

September 2007

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. M A Stocker Chairman of the Board of Examiners December 2007

Comments The paper was answered quite well overall and there are no particular topics that stand out as being poorly attempted. Similarly there were no particular misunderstandings widely evident, and no particular errors were made so repeatedly as to be worthy of comment. © Faculty of Actuaries © Institute of Actuaries

Page 102: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 2

1 25(120.2) 18(142.7) 5573.6 129.625 18 43

xxn

Σ += = = =

+

Using the fact that 2 2 2( 1)x n s nxΣ = − + , then for the combined set 2 2 2 2 2[24(58.1) 25(120.2) ] [17(62.2) 18(142.7) ] 874525.14xΣ = + + + =

2

2 874525.14 (5573.6) / 43 3621.02 60.242

s s−∴ = = ∴ =

2 Method: set uniform(0,1) random number r = F(x) = 1 – 1/x2 ⇒ simulated observation x = [1/(1 − r)]1/2 Here we get x = 1.528 , 2.684, 1.198. Note: We can do away with the step of subtracting r from 1 and use x = (1/r)1/2. This gives x = 1.322, 1.078, 1.817. 3 Let N be the number who have another type of account. 2~ binomial(250,0.24) (60, 45.6) (60,6.753 )N N N∴ ≈ = ( 50) ( 49.5)P N P N< → < with continuity correction

49.5 60( 1.55) 1 0.93943 0.0616.753

P Z −= < = − = − =

4 Sample proportion P = 68/200 = 0.34

99% CI is 0.34 0.660.34 2.576 i.e. 0.34 0.086 i.e. (0.254,0.426)200×

± ±

Page 103: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 3

5 (i) 56.7 7.0875.8

x = =

2

2 1 56.7403.95 0.298 0.5467 8

s s⎛ ⎞

= − = ⇒ =⎜ ⎟⎜ ⎟⎝ ⎠

.

90% CI for the true mean is given by:

7,0.050.5467.0875 1.895 7.0875 0.3658

8sx tn

± = ± = ±

i.e. the 90% CI is (6.722, 7.453), or (£6722, £7453).

(ii) The value £6500 is not included in the CI above, and therefore we conclude

that the data are not consistent with the expert’s assessment at the 10% significance level.

6 Use the result 22

2 ~1

nr n t

r−

− under H0.

From tables 0.01,13 2.650t =

so critical value is solution of 2

13 2.651

r

r=

Solving gives

2

2

2.6513 0.5922.651

13

r = =+

7 From the Yellow Book:

Mean: E(S) = E(N)E(X) = (20)(10000) = £200,000 Variance: var(S) = E(N) var(X) + var(N)[E(X)]2

= 20(20002) + 20(100002) = 62080 10× ∴ Standard deviation = £45607 [OR, using compound Poisson results (in Yellow Book) E(S) = λm1 and var(S) = λm2 where λ = E(N) and mr = E(X r)]

Page 104: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 4

8 X ~ N with mean μ = 30 and σ = 4 (working in units of £1000)

(i) (a) ( ) ( )35 3035 1.25 1 0.89435 0.105654

P X P Z P Z−⎛ ⎞> = > = > = − =⎜ ⎟⎝ ⎠

(b) ( ) ( )36 3036 1.5 1 0.93319 0.066814

P X P Z P Z−⎛ ⎞> = > = > = − =⎜ ⎟⎝ ⎠

(ii) P(X > 36 | X > 35) = P(X > 36 and X > 35) / P(X > 35) = P(X > 36) / P(X > 35) = P(Z > 1.5)/P(Z > 1.25) = 0.06681/0.10565 = 0.632

(iii) 2 350.1056 0.8944 0.0798

2⎛ ⎞

× × =⎜ ⎟⎝ ⎠

9 (i) If X is the random variable denoting the number of policies giving a claim,

then X ~ binomial(250,0.5).

Using the normal approximation (CLT), (125,62.5)X N≈ . Using the appropriate continuity correction we have: ( 139) ( 138.5)P X P X≥ = >

( )138.5 125 1 1.7076 0.04462.5

P Z −⎛ ⎞= > = − Φ =⎜ ⎟

⎝ ⎠.

(ii) This is a one-sided test of 0 : 0.5H p = v 1 : 0.5H p > . P-value of the test is 0.044 from part (i). The evidence against the hypothesis that p = 0.5 (and in favour of p > 0.5) is not strong enough to justify rejecting it at the 1% level of testing — we cannot conclude “p > 0.5”.

Page 105: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 5

10 (i) Chi-square statistic is doubled and has value 9.722 OR work it out (ii) P-value is given by ( )2

2 9.722 0.0077P χ > =

Note: answer = 0.008 is acceptable for the mark (iii) Comment: With the first table we do not have strong enough evidence to

justify rejecting the hypothesis of no association. In the second table, we have the same proportions in the columns, but based on more data, and now we do have strong enough evidence (P-value < 1%) to justify rejecting the hypothesis of no association.

11 (i) (a) 1 33Y X X Y= ⇒ = , and range of Y is (0, ).∞

The cdf is given by 3 3( ) ( ) ( ) ( )Y XF y P Y y P X y F y= ≤ = ≤ =

31 exp( ), 0( )

0, 0Yy yF y

y

⎧ − −λ ≥⎪∴ = ⎨<⎪⎩

(using formulae or by integration). Then, the pdf of Y can be derived as

( )2 3( ) ( ) 3 expY Ydf y F y y ydy

= = λ −λ .

[OR, directly as

3 2( ) ( ) 3y

Y Xdxf y f x e ydy

−λ= = λ

( )2 3( ) 3 expYf y y y⇒ = λ −λ ,

OR, from formulae, identifying the cdf as that of a Weibull distribution

with , 3.c = λ γ = ]

Page 106: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 6

(b) First simulate X ~ exp(λ) as

11 log(1 )xu e x u−λ= − ⇒ = − −λ

,

then set 1

3y x= .

[OR, use cdf of Y directly, i.e. 3

1311 log(1 )yu e y u−λ ⎧ ⎫= − ⇒ = − −⎨ ⎬λ⎩ ⎭

]

(ii)

(a) ( ){ }2 3 2 3

1 1( ) ( ; ) 3 exp 3 exp

n nn n

i i i i iii i i

L f y y y y y= =

⎛ ⎞λ = λ = λ −λ = λ −λ⎜ ⎟⎜ ⎟

⎝ ⎠∑∏ ∏ ∏

3( ) log ( ) log( ) i

iL n yλ = λ = λ − λ +∑ constant

3( ) ii

n y′ λ = −λ ∑

3ˆ( ) 0

ii

ny

′ λ = ⇒ λ =∑

[Check that 2( ) 0.n′′ λ = − <λ

]

(b) For the given data we have 3 16.3952i

iy =∑

38ˆ 0.488

16.3952ii

ny

∴λ = = =∑

.

(iii) (a) For X ~ exp(λ) we have

( )

( )( )( ) 1 1

x

xf x eh xS x e

−λ

−λ

λ= = = λ

− −

For Y (using pdf and cdf derived above):

( )( )( )

2 32

3

3 exp( )( ) 3( ) 1 1 exp

y yf yh y yS y y

λ −λ= = = λ

− − −λ.

Page 107: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 7

(b) X has a constant hazard rate ( )h x = λ , and therefore should only be used when the force of mortality can be assumed constant, e.g. over a one-year period of time in mortality studies. For longer periods of lifetime the r.v. Y is more suitable, as it gives an increasing hazard function with time.

12 (i) Let X be a reading and M be the number of readings which are less than 1 (a) Since X ~ U(0, θ) , P(X < 1) = length of [0,1]/ length of [0, θ] = 1/θ

(b) ( ) ( ) ( )1 1 11 log logm n m

L m n m− θ −⎛ ⎞ ⎛ ⎞ ⎛ ⎞θ ∝ − ⇒ θ = − θ + −⎜ ⎟ ⎜ ⎟ ⎜ ⎟θ θ θ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

( ) ( ) ( )log 1 logn m n⇒ θ = − θ − − θ

ˆset to zero /1

n m n n m∂ −⇒ = − ⇒ θ =

∂θ θ − θ

OR Since M ~ bi(n,1/θ) , MLE of 1/θ is the sample proportion of

readings which are <1, namely m/n, so ( ) ˆ ˆ1/ / 1/ / /m n m n n mθ = ⇒ θ = ⇒ θ =

(c) ( )( )

2

2 2 21 1

n mn m n n−∂ − ∂= − ⇒ − = −

∂θ θ − θ ∂θ θθ −

∴( ) ( ) ( )

2

2 2 2 2 2 2/

11 1

n M n n n n nE E⎡ ⎤⎡ ⎤∂ − − θ⎢ ⎥− = − = − =⎢ ⎥⎢ ⎥∂θ θ θ θ θ −⎢ ⎥ θ − θ −⎣ ⎦ ⎣ ⎦

∴ CRlb = ( )2 1n

θ θ −

Large sample distribution of θ is ( )2 1ˆ ,Nn

⎛ ⎞θ θ −θ θ⎜ ⎟

⎜ ⎟⎝ ⎠

(ii) (a) n = 100, m = 45, ˆ 100 / 45 2.222θ = = Estimate of standard error of θ = [(100/45)2(100/45 – 1)/100]1/2 = 0.2457 ⇒ approximate 95% CI for θ is given by 2.222 ± 1.96 × 0.2457 i.e. 2.222 ± 0.482 i.e. (1.74, 2.70).

Page 108: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 8

(b) Under H0: ( )ˆ 3,0.18Nθ∼

( ) ( )2.222 3ˆ-value 2.222 1.8340.4243

P P P Z P Z−⎛ ⎞= θ < = < = < −⎜ ⎟⎝ ⎠

= 0.033 (by interpolation in the table) We can reject H0 (at levels of testing down to 3.3%) and conclude that

θ < 3. [OR note that P(Z < −1.834) is less than 0.05, so “reject H0 at 5% level”] 13 (i) (a) SST = 48196 – 8722/16 = 48196 – 47524 = 672 SSB = (1862 + 2362 + 2432 + 2072)/4 – 47524 = 523.5 SSR = 672 – 523.5 = 148.5

F = 523.5 / 3 174.5 14.10148.5 /12 12.375

= =

[or construct an ANOVA table] (b) F3,12(1%) = 5.953 from tables since 14.10 >> 5.953, P-value << 0.01 (c) Statistician A could plot either the individual y values or the four

means of y against x to see what “shape” the effect might take. Here is a plot of the individual y values:

Page 109: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2007 — Examiners’ Report

Page 9

The shape of the effect seems to be curved, initially increasing, then decreasing.

(d) The implications are simply that there is nothing to invalidate the

assumptions required for the analysis.

(ii) (a) 22403680 80

16xxS = − =

287248196 672

16yyS = − =

[or could state it is the same as SST from (i)]

(240)(872)13150 7016xyS = − =

70ˆ 0.87580

β = = as required

1ˆ (872 0.875(240)) 41.37516

α = − = as required.

(b) 2

2 1 70ˆ (672 ) 43.62514 80

σ = − =

s.e. 43.625ˆ( ) 0.738580

β = =

0.875 0 1.1850.7385

t −= = on 14 d.f.

P-value = 2 × P(t14 >1.185) As P(t14 >1.345) = 0.10 from tables, P-value > 2(0.10), i.e. > 0.20 This implies that there is no evidence against H0, and hence that there

is no linear relationship between x and y. [not that there is no “relationship”] (c) Residual plot suggests that there may be a curved, rather than linear,

relationship between x and y. (d) Statistician B could try a quadratic regression (or some other curved

form) of y on x.

END OF EXAMINERS’ REPORT

Page 110: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

10 April 2008 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 13 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 A2008 © Institute of Actuaries

Page 111: ct32005-2010

CT3 A2008—2

1 The number of claims which arose during the calendar year 2005 on each of a group of 80 private motor policies was recorded and resulted in the following frequency distribution:

Number of claims x 0 1 2 3 4 Number of policies f 64 12 3 0 1

For these data Σfx = 22, Σfx2 = 40

Calculate the sample mean and standard deviation of the number of claims per policy.

[3] 2 Data on a sample of 29 claim amounts give a sample mean of £461.5 and a sample

standard deviation of £618.8. One claim amount of £3,657.50 is identified as an outlier and after investigation is

found to be in error. Calculate the revised sample mean and standard deviation if this erroneous amount is removed. [4]

3 The following sample contains claim amounts (£) on a particular class of insurance

policies:

1,717 1,595 1,764 1,464 1,854 1,560 1,698 1,614 1,524 4,320 1,626 1,440 1,602

(i) Determine the mean and the median of the claim amounts. [2] (ii) State, with reasons, which of the two measures considered above you would

prefer to use to estimate the central point of the claim amounts. [1] [Total 3] 4 Consider two events A and B, such that ( ) 0.3 and ( ) 0.1P A P A B= ∩ = . Find the minimum and maximum possible values of the conditional probability

( | )P A B . [4]

Page 112: ct32005-2010

CT3 A2008—3 PLEASE TURN OVER

5 An insurance company covers claims from four different non-life portfolios, denoted as G1, G2, G3 and G4. The number of policies included in each portfolio is given below:

Portfolio G1 G2 G3 G4 No. of policies 4,000 7,000 13,000 6,000

It is estimated that the percentages of policies that will result in a claim in the

following year in each of the portfolios are 8%, 5%, 2% and 4% respectively. Suppose a policy is chosen at random from the group of 30,000 policies comprising

the four portfolios after one year and it is found that a claim did arise on this policy during the year. Calculate the probability that the selected policy comes from portfolio G3. [3]

6 Consider two random variables X and Y with joint probability density function (pdf)

4( , ) (1 ) , 0 1, 0 13

f x y xy x y= − < < < < .

The marginal pdf of X is given by

2( ) (2 ) , 0 13

f x x x= − < <

with a corresponding marginal pdf for Y by symmetry (you are not asked to verify

these marginal densities). (i) Show that the conditional pdf of Y given X = x is given by

(1 )( | ) 2 , 0 1.(2 )

xyf y x yx

−= < <

− [2]

(ii) (a) Determine the conditional expectation ( | )E Y X x= as a function of x

and hence determine ( )E Y . (b) Verify your answer in part (a) by determining ( )E Y directly from the

marginal pdf of Y. [5] [Total 7]

Page 113: ct32005-2010

CT3 A2008—4

7 The claim amount X in units of £1,000 for a certain type of industrial policy is modelled as a gamma variable with parameters α = 3 and λ = ¼.

(i) Use moment generating functions to show that 26

1 ~2

X χ . [3]

(ii) Hence use tables to find the probability that a claim amount exceeds £20,000.

[2] [Total 5] 8 A woodcutter has to cut 100 fence posts of a standard length and he has a metal bar of

the required length to act as the standard. The woodcutter decides to vary his procedure from post to post − he cuts the first post using the metal standard, then uses this post as his standard for the cut of the next post. He continues in a similar manner, each time using the most recently cut post as the standard for the next cut.

Each time the woodcutter cuts a post there is an error in the length cut relative to the

standard being employed for that cut − you should assume that the errors are independent observations of a random variable with mean 0 and standard deviation 3mm.

Calculate, approximately, the probability that the length of the final post differs from

the length of the original metal standard by more than 15mm. [5]

9 A researcher wishes to investigate whether a coin is balanced or not, that is if

( ) 0.5P heads = . She throws the coin four times and decides to accept the hypothesis

0 : ( ) 0.5H P heads = in a test against the alternative 1 : ( ) 0.5H P heads ≠ , if the number of times that the coin lands “heads” is 1, 2, or 3.

(i) Calculate the probability of the type I error of this test. [3] (ii) Calculate the probability of the type II error of this test, if the true probability

that the coin lands “heads” is 0.7. [3] [Total 6] 10 Pressure readings are taken regularly from a meter. It transpires that, in a random

sample of 100 such readings, 45 are less than 1, 35 are between 1 and 2, and 20 are between 2 and 3.

Perform a χ2 goodness of fit test of the model that states that the readings are

independent observations of a random variable that is uniformly distributed on (0, 3). [5]

Page 114: ct32005-2010

CT3 A2008—5 PLEASE TURN OVER

11 In an investigation about the duration of insurance policies of a certain type, a sample of n policies is studied. All n policies have been initiated at the same time, which is also the time of the start of the investigation. For each policy, the time T (in months) until the policy expires can be modelled as an exponential random variable with parameter λ, independently of the times for all other policies.

(i) Suppose that the investigation is terminated as soon as k policies have expired,

where k is a known (predetermined) constant. The observed policy expiry times are denoted by t1, t2, …, tk with 0 k n< ≤ and t1< t2 < … < tk.

(a) Show that the probability that any randomly selected policy is still in

force at the time of the termination of the investigation is kte−λ . (b) Show that the likelihood function of the parameter λ, using information

from all n policies, is given by

1 ( )( )

ki

i kt

n k tkL e e=−λ

− − λ∑

λ = λ . Hence find the maximum likelihood estimate (MLE) of λ. (c) Consider an investigation on 20 policies which is terminated when five

policies have expired, giving the following observed expiry times (in months):

1.03 6.67 12.70 12.88 21.54

Calculate the MLE of λ based on this sample.

[9] (ii) Suppose instead that the investigation is terminated after a fixed length of time

t0. The number of policies that have expired by time t0 is considered to be a random variable, denoted by K.

(a) Explain clearly why the distribution of K is binomial and determine its parameters. (b) Hence find the MLE of λ in this case. (c) Consider an investigation on 20 policies that is terminated after 24

months. By the time of termination five policies have expired. Use this information to calculate the MLE of λ in this case. [9] [Total 18]

Page 115: ct32005-2010

CT3 A2008—6

12 The members of the computer games clubs of three neighbouring schools decide to take part in a light-hearted competition. Each club selects five of its members at random under a procedure agreed and supervised by the clubs. Each selected student then plays a particular game at the end of which the score he/she has attained is displayed and recorded – the standard set by the games designers is such that reasonably competent players should score about 100.

The results are as follows:

School 1 105 134 96 147 116 School 2 103 81 91 100 110 School 3 137 115 105 123 149

(i) An analysis of variance is conducted on these results and gives the following

ANOVA table:

Source of variation

d.f. SS MSS

Between schools 2 2,298 1,149 Residual 12 3,468 289 Total 14 5,766

(a) Test the hypothesis that there are no school effects against a general

alternative. You should quote a narrow range of values within which the

probability-value of the data lies, and state your conclusion clearly. (b) Calculate a 95% confidence interval for the underlying mean score for

club members in School 1, using the information available from all three schools. [7]

Page 116: ct32005-2010

CT3 A2008—7 PLEASE TURN OVER

(ii) The members of the computer games club of a nearby fourth school hear about the competition and ask to be included in the overall comparison. Scores for a random sample of five of the club members at this school (School 4) are obtained and are:

112 140 88 103 123.

The scores obtained by all twenty students are shown in the display below:

(a) Carry out an analysis of variance on the results for all four schools

together − you should construct the ANOVA table and test the hypothesis that there are no school effects against a general alternative.

You should quote an approximate value for the probability-value of the

data, and state your conclusion clearly. (b) Comment briefly on the comparison of the results of the analysis

involving Schools 1−3 only conducted in part (i)(a) and the results here of the analysis involving all four schools.

(c) Calculate a 95% confidence interval for the difference in the

underlying mean scores for club members in Schools 1 and 2, using the information available from all four schools.

[13] [Total 20]

Page 117: ct32005-2010

CT3 A2008—8

13 In a medical experiment concerning 12 patients with a certain type of ear condition, the following measurements were made for blood flow (y) and auricular pressure (x):

x: 8.5 9.8 10.8 11.5 11.2 9.6 10.1 13.5 14.2 11.8 8.7 6.8 y: 3 12 10 14 8 7 9 13 17 10 5 5

2 2126.5 1,381.85 113 1, 251 1, 272.2x x y y xyΣ = Σ = Σ = Σ = Σ =

(i) Construct a scatterplot of blood flow against auricular pressure and comment

briefly on any relationship between them. [3] (ii) Calculate the equation of the least-squares fitted regression line of blood flow

on auricular pressure. [4] (iii) (a) Use a suitable pivotal quantity with a t distribution to show how to

derive the usual 95% confidence interval for the slope coefficient of the underlying regression line, and calculate the interval.

(b) Use your calculated confidence interval to comment on the hypothesis

that the true underlying slope coefficient is equal to 1.5. [5] (iv) (a) Use a suitable pivotal quantity with a χ2 distribution to derive a 95%

confidence interval for the underlying error variance σ2, and calculate the interval.

(b) Hence calculate a 95% confidence interval for the error standard

deviation σ. [5] [Total 17]

END OF PAPER

Page 118: ct32005-2010

Faculty of Actuaries Institute of Actuaries

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

April 2008

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. M A Stocker Chairman of the Board of Examiners June 2008

© Faculty of Actuaries © Institute of Actuaries

Page 119: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 2

1 n = 80, Σfx = 22, Σfx2 = 40 22 / 80 0.275x = =

2

2 1 22 33.9540 0.42975 0.65679 80 79

s s⎛ ⎞

= − = = ⇒ =⎜ ⎟⎜ ⎟⎝ ⎠

2 29(461.5) 13383.5x nxΣ = = = 2 2 2 2 2( 1) 28(618.8) 29(461.5) 16898062x n s nxΣ = − + = + =

removing the outlier of 3657.5 gives 13383.5 3657.5 9726xΣ = − = 2 216898062 3657.5 3520756xΣ = − =

9726 £347.428

x∴ = =

2

2 1 9726[3520756 ] 5272.6 £72.627 28

s s= − = ∴ =

3 (i) 23778 1829.08.13

x = =

Median = 7th ordered observation = 1614. (ii) The median should be preferred, as it is not sensitive to the extreme observed

claim of £4320.

4 ( ) ( )( )

|P A B

P A BP B∩

=

Maximum value of P(B) is 0.8 in which case ( ) 0.1| 0.1250.8

P A B = =

Minimum value of P(B) is 0.1, in the case B ⊂ A. Then ( )| 1P A B =

Page 120: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 3

5 Define the following events: C: Policy results in a claim; Bi: Policy comes from portfolio i, i = 1, 2, 3, 4. Then the required probability is P(B3|C), and using Bayes’ theorem:

3 3 3 33

( | ) ( ) ( | ) ( )( | )( ) ( | ) ( )i i

i

P C B P B P C B P BP B CP C P C B P B

= =∑

,

which gives

3

130.02 0.26 / 30 0.2630( | ) 0.2224 7 13 6 1.17 / 30 1.170.08 0.05 0.02 0.0430 30 30 30

P B C×

= = = =× + × + × + ×

.

[OR It is possible to argue straight to

260/(320 + 350 + 260 + 240) = 260/1170 = 0.222

which is correct and gets full marks.]

6 (i) ( , )( | )( )

f x yf y xf x

=

4 (1 ) (1 )3 2 , 0 12 (2 )(2 )3

xy xy yxx

− −= = < <

−−

(ii) (a) 1

0

2( | ) (1 )(2 )

E Y X x y xy dyx

= = −− ∫

2 3

10

2 2 1 (3 2 )[ ] ( )(2 ) 2 3 (2 ) 2 3 3(2 )

y y x xxx x x

−= − = − =

− − −

1

0

(3 2 ) 2( ) (2 )3(2 ) 3

xE Y x dxx

−= −

−∫

1 2 1

00

2 2 4(3 2 ) [3 ]9 9 9

x dx x x= − = − =∫

Page 121: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 4

(b) 31 2 1

00

2 2 2 2 4( ) (2 ) [ ]3 3 3 3 3 9

yE Y y y dy y= − = − = ⋅ =∫

7 (i) 3( ) ( ) (1 4 )tX

XM t E e t −= = − from yellow book

Let 12

Y X= .

/ 2 6 / 2( ) ( ) ( ) ( / 2) (1 2 )tY tX

Y XM t E e E e M t t −∴ = = = = − which is the m.g.f. of a gamma(3,1/2) or 2

6χ variable (ii) ( 20) ( 10)P X P Y> = > = 1 – 0.8753 = 0.1247

8 Let L be the length of the metal bar and Zi be the error that arises at the ith cut. Length of 1st post cut = L + Z1 Length of 2nd post cut = L + Z1 + Z2 Length of 100th post cut = L + Z1 + Z2 + … + Z100 Error in length of last post cut is E = Z1 + Z2 + … + Z100 E ~ N(0,900) approximately, by CLT P(|E| <15) ≈ P(|Z| < 15/30) = P(|Z| < 0.5) = 2 × 0.1915 = 0.383 So P(error exceeds 15mm) ≈ 1 – 0.383 = 0.617 9 (i) Probability of type I error is 0 0 0(reject is true) ( 0 or 4 is true)P H H P X X Hα = = = = , which gives

4 4

{ 0 ( ) 0.5)} { 4 ( ) 0.5)}

1 1 0.125.2 2

P X P Heads P X P Headsα = = = + = =

⎛ ⎞ ⎛ ⎞= + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Page 122: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 5

(ii) Probability of type II error of the test at P(Heads) = 0.7 is 0 1(accept is true) 1 { 0 or 4 ( ) 0.7}P H H P X X P Headsβ = = − = = = ( )4 41 0.3 0.7 0.7518.⇒β = − + =

[OR using { 1 or 2 or 3 | ( ) 0.7}]P X X X P heads= = = = 10 range 0–1 1–2 2–3 observed frequency 45 35 20 expected frequency 100/3 100/3 100/3 χ2 = [(45 – 100/3)2 + (35 – 100/3)2 + (20 – 100/3)2]/(100/3) = 9.50 on 2df P-value = 2

2( 9.50) 0.01P χ > < Reject model (at the 1% level of testing) as not providing a good fit to the data. OR 5% point of χ2

2 is 5.991, so we reject model at 5% OR 1% point of χ2

2 is 9.210, so we reject model at 1%

11 (i) (a) The required probability is ( ) 1 ( ) 1 ( )k k T kP T t P T t F t> = − ≤ = − 1 (1 )k kt te e−λ −λ= − − = (using formulae or by integration). (b) The likelihood function is given by:

1 1

( ) ( ) ( )k n

i ki j k

L f t P T t= = +

λ = >∏ ∏

( ) ( ) 1 ( )

1 1

ki

i k i kk n t

t t n k tk

i j ke e e e=

−λ−λ −λ − − λ

= = +

∑= λ = λ∏ ∏

Page 123: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 6

For the MLE:

1

( ) log ( ) log( ) ( )k

i ki

l L k t n k t=

λ = λ = λ −λ − − λ∑

1

( ) ( )k

i ki

kl t n k t=

′ λ = − − −∑λ

1

ˆ( ) 0( )

ki k

i

klt n k t

=

′ λ = ⇒ λ =+ −∑

.

[And 2( ) 0kl′′ λ = − <λ

]

(c) For the observed data,

n = 20, k = 5, tk = 21.54, 1

54.82k

ii

t=

=∑ .

1

5ˆ 0.013254.82 15 21.54( )

ki k

i

k

t n k t=

λ = = =+ ×+ −∑

.

(ii) (a) We have n policies with independent durations, and each will have

expired by the time of termination with probability 0

0( ) 1 tp P T t e−λ= ≤ = − , or will have not expired with probability 1 - p. Therefore, K ~ bin ( )0, 1 tn e−λ−

(b) ( ) ( )0 0( ) 1 k n kt tL e e

−−λ −λλ ∝ −

( )0

0( ) log ( ) log 1 ( )tl L k e n k t−λλ = λ = − − − λ

0

0

00( ) ( )

1

t

tkt el n k t

e

−λ

−λ′ λ = − −

0

0

1ˆ( ) 0 log 1t n k kl en t n

−λ − ⎛ ⎞′ λ = ⇒ = ⇒ λ = − −⎜ ⎟⎝ ⎠

Page 124: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 7

[OR, observed proportion (k/n) is the MLE of corresponding proportion/probability {1 − exp(−λt0)}; solving for λ leads to same estimate as above.]

(c) Now t0 = 24 and all other involved quantities are as before.

0

1 1 5ˆ log 1 log 1 0.012024 20

kt n

⎛ ⎞ ⎛ ⎞λ = − − = − − =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

.

12 (i) (a) F = 1149/289 = 3.98 on (2, 12) degrees of freedom From Yellow Tables pages 172/3, P-value of the data is between 0.05

and 0.025. We can reject H0 (the “no schools effects” hypothesis) at the 5% level

of testing but not at the 1% level. We have some evidence against the “no schools effects” hypothesis − and conclude that there are school effects (i.e. differences among the underlying means).

(b) School 1 mean = 598/5 = 119.6 t12(0.025) = 2.179 95% CI for school 1 mean is 119.6 ± 2.179×(289/5)1/2 i.e. 119.6 ± 16.6 or (103.0, 136.2) (ii) (a) y1• = 598, y2• = 485, y3• = 629, y4• = 566 y•• = 2278, Σy2 = 266,788 SST = 266788 – 22782/20 = 7323.8 SSB = (5982 + 4852 + 6292 + 5662)/5 − 22782/20 = 2301 ⇒ SSR = 7324 − 2301 = 5023

Source of variation

d.f. SS MSS

Between schools 3 2301 767 Residual 16 5023 314 Total 19 7324

F = 767/314 = 2.44 on (3, 16) degrees of freedom

Page 125: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 8

From Yellow Tables pages 172/3, P-value of the data is just more than 0.1 (>10%)

We do not have sufficiently strong evidence against the “no schools

effects” hypothesis, which can stand. (b) With only three schools involved, the results from one of them (School

2) are sufficiently different from those of the other two to allow us to detect a difference among underlying means. However, the results for the fourth school range across the results for the original three schools − with all four schools in the comparison, the “between schools” sum of squares is no longer so high relative to the residual and we fail to detect differences.

(c) t16(0.025) = 2.120

95% CI is ( )0.51 1119.6 97 2.120 314

5 5⎧ ⎫⎛ ⎞− ± +⎨ ⎬⎜ ⎟

⎝ ⎠⎩ ⎭

i.e. 22.6 ± 23.76 or (−1.2, 46.4)

13 (i) A clearly labelled scatterplot:

There seems to be a positive linear relationship between blood flow and

auricular pressure.

Page 126: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 9

(ii) n = 12

2126.51381.85 48.3292

12xxS = − =

21131251 186.9167

12yyS = − =

(126.5)(113)1272.2 80.991712xyS = − =

80.9917ˆ 1.67648.3292

xy

xx

SS

β = = =

1ˆˆ (113 1.6758*126.5) 8.24912

y xα = −β = − = −

Fitted line is y = -8.249 + 1.676x

(iii) (a) 22

ˆ~

ˆn

xx

t

S

−β−β

σ where

22 1ˆ ( )

2xy

yyxx

SS

n Sσ = −

2 22

ˆ[ (2.5%) (2.5%)] 0.95

ˆn n

xx

P t t

S

− −β−β

− < < =σ

Rearrangement results in the 95% confidence interval for β

2

2ˆˆ (2.5%)nxx

tS−σ

β±

Here: 2

2 1 (80.9917)ˆ (186.9167 ) 5.118810 48.3292

σ = − =

95% CI is 1.676 2.228(0.3254) 1.676 0.725 (0.95,2.40)± ⇒ ± ⇒ (b) As 1.5 lies comfortably inside this confidence interval, then there is no

evidence at all against the hypothesis that β = 1.5.

Page 127: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2008 — Examiners’ Report

Page 10

(iv) (a) 2

222

ˆ( 2) ~ nn

−− σ

χσ

2

2 22 22

ˆ( 2)[ (97.5%) (2.5%)] 0.95n nnP − −− σ

χ < < χ =σ

Rearrangement results in the 95% confidence interval for σ2

2 2

22 2

2 2

ˆ ˆ( 2) ( 2)(2.5%) (97.5%)n n

n n

− −

− σ − σ< σ <

χ χ

Here 95% CI is 210(5.1188) 10(5.1188)20.48 3.247

< σ <

(2.50,15.76)⇒ (b) 95% CI for σ is ( 2.50, 15.76) (1.58,3.97)⇒

END OF EXAMINERS’ REPORT

Page 128: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

18 September 2008 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 12 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 S2008 © Institute of Actuaries

Page 129: ct32005-2010

CT3 S2007—2

1 The mean of a sample of 30 claim amounts arising from a certain kind of insurance policy is £5,200. Six of these claim amounts have mean £8,000 while ten others have mean £3,100.

Calculate the mean of the remaining claim amounts in this sample. [3] 2 Five years ago a financial institution issued a specialised type of investment bond and

investors had the option to cash in after 1, 2, 3, 4 or 5 years. The following table gives a frequency distribution showing the numbers of those investors who cashed in at each stage.

duration (length of time held before being cashed in) 1 year 2 years 3 years 4 years

5 years

130 151 97 64 98 Calculate the sample mean and standard deviation of the duration of these bonds

before being cashed in. [4] 3 (i) Let Y be the sum of two independent random variables X1 and X2, that is, 1 2Y X X= + . Show that the moment generating function (mgf) of Y is the product of the

mgfs of X1 and X2. [2] (ii) Let X1 and X2 be independent gamma random variables with parameters

1( , )α λ and 2( , )α λ , respectively . Use mgfs to show that 1 2Y X X= + is also a gamma random variable and

specify its parameters. [2] [Total 4] 4 A random sample of 15 observations is taken from a normally distributed population

of values. The sample mean is 94.2 and the sample variance is 24.86. Calculate a 99% confidence interval for the population mean. [3] 5 The number of claims, X, arising on each policy in a certain portfolio depends on

another random variable Y. X is considered to follow a Poisson distribution with mean Y. The variable Y itself is assumed to have a gamma distribution with parameters (a,b).

Find expressions for the unconditional moments E[X] and E[X 2] using appropriate conditional moments. [4]

Page 130: ct32005-2010

CT3 S2008—3 PLEASE TURN OVER

6 Suppose that the time T, measured in days, until the next claim arises under a portfolio of non-life insurance policies, follows an exponential distribution with mean 2.

(i) Find the probability that no claim is made in the next one day period. [2] (ii) The median of a random variable is defined as the value for which the

cumulative distribution function of the variable is equal to 0.5. Find the median time until the next claim arises. [2] (iii) Now let T1, T2, …, T30 be the times (in days) until the next claim arises under

each one of 30 similar portfolios of non-life insurance policies, and assume that each Ti, i = 1,…,30, follows an exponential distribution with mean 2, independently of all others.

Calculate, approximately, the probability that the total of all 30 times which

elapse until a claim arises on each of the portfolios exceeds 45 days. [4] [Total 8] 7 Let N be the number of claims arising on a group of policies in a period of one week

and suppose that N follows a Poisson distribution with mean 60. Let X1, X2, . . , XN be the corresponding claim amounts and suppose that,

independently of N, these are independent and identically distributed with mean £500 and standard deviation £400.

Let 1

N

ii

S X=

=∑ be the total claim amount for the period of one week.

(i) Determine the mean and the standard deviation of S. [2] (ii) Explain why the distribution of S can be taken as approximately normal, and

hence calculate, approximately, the probability that S is greater than £40,000. [3] [Total 5]

Page 131: ct32005-2010

CT3 S2007—4

8 (i) Use the following uniform(0,1) random numbers

0.9236 , 0.2578 and a suitable table of probabilities to simulate two observations of the random

variable X, where X ~ N(200,100). [3] (ii) Use the following uniform(0,1) random numbers

0.3287 , 0.9142 to simulate two observations of the random variable Y, where Y has an

exponential distribution with mean 100. [3] [Total 6]

9 A random sample of four insurance policies of a certain type was examined for each of three insurance companies and the sums insured were recorded. An analysis of variance was then conducted to test the hypothesis that there are no differences in the means of the sums insured under such policies by the three companies.

The total sum of squares was found to be SST = 420.05 and the between-companies

sum of squares was found to be SSB = 337.32. (i) Perform the analysis of variance to test the above hypothesis and state your

conclusion. [4] (ii) State clearly any assumptions that you made in performing the analysis in (i).

[2] (iii) The plot of the residuals of this analysis of variance against the associated

fitted values, is given below.

18 20 22 24 26 28 30 32

-4-2

02

4

Fitted

Res

idua

ls

Comment briefly on the validity of the test performed in (i), basing your

answer on the above plot. [2] [Total 8]

Page 132: ct32005-2010

CT3 S2008—5 PLEASE TURN OVER

10 When a new claim comes into an office it is screened at a first stage and has a probability θ of being cleared for progress, otherwise it is rejected. If it clears the first stage, it is then independently screened at a second stage and has the same probability θ of being cleared for progress, otherwise it is rejected.

(i) Explain clearly why the probability of a claim being rejected at the first stage

is 1 - θ, of being rejected at the second stage is θ (1 - θ) and of progressing after the two stages is θ2. [3]

(ii) For a sample of n independent claims which came into the office x1 were

rejected at the first stage, x2 were rejected at the second stage and x3 progressed after the two stages (x1 + x2 + x3 = n).

(a) Write down the likelihood L(θ) for this sample and hence show that the

derivative of the log-likelihood is given by

2 3 1 22log ( )1

x x x xL + +∂θ = −

∂θ θ −θ.

(b) Show that the maximum likelihood estimator (MLE) is given by

2 3

1 2 3

2ˆ2 2

x xx x x

+θ =

+ +.

[7]

(iii) (a) Determine the second derivative 2

2 log ( )L∂θ

∂θof the log-likelihood in

part (ii) above and hence show that the Cramer-Rao lower bound

(CRlb) is given by (1 )(1 )nθ −θ

+ θ.

(b) Use the asymptotic distribution for the MLE θ with the CRlb

evaluated at θ to obtain an approximate large-sample 95% confidence interval for θ expressing it simply in terms of θ and n. [7]

(iv) For a sample of 1,000 independent claims, 110 were rejected at the first stage,

96 were rejected at the second stage and 794 progressed after the two stages. Calculate the MLE θ together with an approximate 95% confidence interval

for θ. [3] [Total 20]

Page 133: ct32005-2010

CT3 S2007—6

11 A study was conducted to investigate lengths of stay, in days, of short-term stay patients in a particular hospital. Independent random samples of 40 male patients and 35 female patients were selected and the lengths of stay of these patients are given in the following tables:

Male Female 4 8 2 6 9 6 4 10 2 7 5 1 9 1 3 6 6 8 7 3 5 6 1 8 6 4 1 5 4 4 10 7 5 7 7 6 9 2 3 5 4 6 5 1 8 1 7 2 8 11 5 6 2 7 2 4 4 11 6 8 1 8 1 3 9 3 1 3 3 6 5 9 6 2 3 Male: Σx = 215 Σx2 = 1,481 Female: Σx = 168 Σx2 = 1,026 The male observations are assumed to be normally distributed with mean μ1 and

standard deviation 1σ , and independently the female observations are assumed to be

normally distributed with mean μ2 and standard deviation 2σ . (i) Suppose that it is known that 1σ = 3.0 days and 2σ = 2.5 days. (a) Construct a 95% confidence interval for the difference between the

mean length of stay for males and the mean length of stay for females, that is for μ1 − μ2.

(b) Comment briefly on any implications of this confidence interval. [6] (ii) Suppose now that 1σ and 2σ are unknown. (a) Perform a two-sample t-test to investigate whether there is a difference

between the mean length of stay for males and the mean length of stay for females, assuming that 1σ and σ2 are equal.

(b) Show that the variances in the male and female samples are not

significantly different at the 5% level, and comment briefly with reference to the validity of the test conducted in (ii)(a).

(c) Suppose you are not prepared to assume more than you feel is

absolutely necessary – in particular you do not want to assume that

1σ and 2σ are equal, nor that the observations necessarily come from normal populations.

Perform an alternative (large-sample) test to that conducted in part

(ii)(a), “to investigate whether there is a difference between the mean length of stay for males and the mean length of stay for females”, and compare the results of the test with the results of the test obtained in part (ii)(a). [11]

[Total 17]

Page 134: ct32005-2010

CT3 S2008—7

12 Consider a situation in which the data consist of two responses at each of five values of an explanatory variable (x = 1, 2, 3, 4, 5), so we have a data set with ten responses (y), as in the following table:

x 1 1 2 2 3 3 4 4 5 5 y 12 19 18 35 19 44 32 53 44 65

For these data Σx = 30, Σy = 341, Σx2 = 110, Σy2 = 14,345 , Σxy = 1,211 (i) You are asked to carry out a linear regression analysis using these data. (a) Draw a plot of the data to show the relationship between the response

and explanatory values. (b) Calculate the total, regression, and residual sums of squares for a least-

squares linear regression analysis of y on x, and hence calculate the value of R2, the coefficient of determination.

(c) Determine the equation of the fitted regression line. (d) Calculate a 95% confidence interval for the slope of the underlying

regression line. [12]

(ii) A colleague suggests that it will be simpler and will produce the same results

if we use the following reduced data, in which the two responses at each x value are replaced by their mean:

x 1 2 3 4 5 y 15.5 26.5 31.5 42.5 54.5

The details of the regression analysis for these data are given in the box below.

Discuss the similarities and the differences between the two approaches and their

results, in particular addressing the claim by the colleague that the two analyses will produce “the same results”. [6] [Total 18]

END OF PAPER

Regression equation: y = 5.90 + 9.40 x

Coef Stdev t-ratio p-val

Intercept 5.900 2.233 2.64 0.078 x 9.400 0.673 13.96 0.001

s = 2.129 R-sq = 98.5%

Analysis of Variance Source df SS MS F p-val Regression 1 883.60 883.60 194.91 0.001 Error 3 13.60 4.53 Total 4 897.20

Page 135: ct32005-2010

Faculty of Actuaries Institute of Actuaries

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

September 2008

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. R D Muckart Chairman of the Board of Examiners November 2008

Comments The paper was answered well overall and there are no particular topics that stand out as being poorly attempted. Similarly there were no particular misunderstandings widely evident, and no particular errors were made so repeatedly as to be worthy of comment.

© Faculty of Actuaries © Institute of Actuaries

Page 136: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2008 — Examiners’ Report

Page 2

1

1 1

2 2

3

30, 52006, 800010, 310014

n xn xn xn

= == == ==

1 1 2 2 3 3

1 2 3

x n x n x n xxn n n n

+ += =

+ +∑

1 1 2 23

3

30 5200 6 8000 10 3100 77000 550014 14

nx n x n xxn

− − × − × − ×⇒ = = = =

2 data: Σf = 540, Σfx = 1469, Σfx2 = 5081

mean = 1469 2.72540

= years

variance = 21 1469(5081 ) 2.0126

539 540− = ∴ s.d. = 1.42 years

3 (i) 1 2( )( ) ( ) ( )t X XtY

YM t E e E e += =

1 21 2

( ) ( ) ( ) ( )tX tXX XE e E e M t M t= =

(ii) ( ) (1 ) iiX

tM t −α= −λ

1 2( )( ) (1 )YtM t − α +α∴ = −λ

so that Y is a gamma r.v. with parameters 1 2( , )α +α λ . 4 t14(0.005) = 2.977

99% CI is 24.8694.2 2.977 i.e. 94.2 3.83 i.e. (90.37,98.03)15

± ±

Page 137: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September2008 — Examiners’ Report

Page 3

5 { }( ) ( ) ( )Y XaE X E E X Y E Yb

= = = .

2 2( ) var( ) ( )E X X E X= + with

{ } { }

2

var( ) var ( ) var ( )

( ) var( )

Y X Y XX E X Y E X Y

a aE Y Yb b

= +

= + = +

giving

2

22 2( ) a a aE X

b b b= + + .

{ }{ }

2 2

2 2

2

[ ( ) ( )

var ( ) ( ) ( ) ( )

( ) var( ) ( )

Y X

Y X X

OR E X E E X Y

E X Y E X Y E Y E Y

E Y Y E Y

=

= + = +

= + +

2

2 2a a ab b b

= + + .]

6 (i) T ~ Exp(0.5) and therefore 0.5 1( 1) 0.6065P T e− ×> = = .

(ii) The median, M, is such that 0.5

0 0

( ) 0.5 0.5 0.5M M

tf t dt e dt−= ⇒ =∫ ∫

which gives 0.51 0.5 2log(0.5)Me M−− = ⇒ = − , or 2 log(2) 1.386M = = . (Note: the cdf is available from the Yellow Book, p11.)

(iii) From CLT, 30

1~ (30 2, 30 4), i.e. (60,120)i

iY T N N

=

= × ×∑ , approximately.

Then,

45 60( 45) ( 1.3693) ( 1.3693) 0.915.120

P Y P Z P Z P Z−⎛ ⎞> = > = > − = < =⎜ ⎟

⎝ ⎠

Page 138: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2008 — Examiners’ Report

Page 4

[OR Y ~ gamma(30,1/2), that is Y ~ 260χ , from which we can then use the

normal approximation as above, or get P(Y > 45) = 0.922 (approximately) by

interpolating in tables of percentage points of 260χ (Yellow Book p168).]

7 (i) ( ) ( ) ( ) (60)(500) £30,000E S E N E X= = = 2( ) ( ) ( ) ( )[ ( )]V S E N V X V N E X= + 2 2(60)(400 ) (60)(500 ) 24,600,000 ( ) £4,960sd S= + = ∴ = (ii) As S is the sum of a large number of i.i.d. variables, then the central limit

theorem gives an approximate normal distribution for S.

40000 30000( 40000) ( 2.016)4960

P S P Z −> = > =

1 0.9781 0.0219= − = [Note: 2.02 leading to 0.0217 is also acceptable.] 8 (i) From Yellow Book Table P(Z < 1.43) = 0.9236 giving x value (10*1.43) + 200 = 214.3 P(Z < −0.65) = 0.2578 giving x value (10*(−0.65)) + 200 = 193.5 (ii) Setting r = P(Y < y) = 1 – exp(−y/100) ⇒ y = −100*log(1 − r) r = 0.3287 ⇒ y = −100log(0.6713) = 39.85 r = 0.9142 ⇒ y = −100log(0.0858) = 245.6 Note: We can do away with the step of subtracting r from 1 and use. y = −100*log(r). This gives y = 111.3, 8.971. 9 (i) SSR = SST – SSB = 420.05 – 337.32 = 82.73. The degrees of freedom are 3 – 1 = 2 for the treatment (company) SS, and 12 – 1 – 2 = 9 for the residual SS.

Page 139: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September2008 — Examiners’ Report

Page 5

These give 337.32 2 18.348.82.73 9

F = =

From tables, F0.01,2,9 = 8.022, and therefore we have strong evidence against

the hypothesis that the means of the insured sums are equal for the 3 companies.

(ii) To perform the ANOVA we assume that the data follow normal distributions and that their variance is constant. (iii) The variance of the residuals seems to depend on the company from which the data come. This violates the assumption of constant variance in the response variable, and therefore the analysis may not be valid. 10 (i) P(rejected at 1st) = 1 – P(cleared at 1st) = 1 – θ P(rejected at 2nd) = P(cleared at 1st)P(rejected at 2nd | cleared at 1st) = θ (1 – θ) P(progressing after two) = P(cleared at 1st) P(cleared at 2nd) = θ2 (ii) (a) 31 2 2( ) [(1 )] [ (1 )] [ ]xx xL θ = −θ θ −θ θ 2 3 1 22 (1 )x x x x+ += θ −θ 2 3 1 2log ( ) ( 2 ) log ( ) log(1 )L x x x x∴ θ = + θ+ + −θ

2 3 1 22log ( )1

x x x xL + +∂∴ θ = −∂θ θ −θ

(b) equate to zero for MLE 1 2 2 3( ) (1 )( 2 )x x x x∴θ + = −θ + 1 2 3 2 3( 2 2 ) 2x x x x x∴θ + + = +

2 3

1 2 3

2ˆ2 2

x xx x x

+∴θ =

+ +

(iii) (a) 2

2 3 1 22 2 2

2log ( )(1 )

x x x xL + +∂θ = − −

∂θ θ −θ

2 2

2 2 2(1 ) 2 (1 ) (1 ){ log ( )}

(1 )n n n nE L∂ θ −θ + θ −θ + θ −θ

θ = − −∂θ θ −θ

Page 140: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2008 — Examiners’ Report

Page 6

1 1 (1 )(1 ) (1 ) (1 )( )(1 ) 1 ) (1 )

n n nn + θ= − + θ − + θ = − + θ + = −

θ −θ θ −θ θ −θ

CRlb 2

2

1 (1 )(1 )

[ log ( )]n

E L

θ −θ= =

+ θ∂− θ

∂θ

(b) ˆ ( , )N CRlbθ ≈ θ for large n

using CRlb = ˆ ˆ(1 )

ˆ(1 )nθ −θ

+ θ, then

ˆ ˆ(1 )ˆ ( , )ˆ(1 )N

nθ −θ

θ ≈ θ+ θ

95% CI is ˆ ˆ(1 )ˆ 1.96 ˆ(1 )nθ −θ

θ±+ θ

(iv) 96 2(794) 1684ˆ 0.8910110 2(96) 2(794) 1890

+θ = = =

+ +

CRlb 0.8910(1 0.8910) 0.00005141000(1 0.8910)

−≈ =

+ 0.00717CRlb∴ =

∴95% CI is 0.8910 1.96(0.00717)± 0.891 0.014 (0.877,0.905)or⇒ ± 11 (i) (a) Males: n1 = 40 1x = 215/40 = 5.375 Females: n2 = 35 2x = 168/35 = 4.8 95% CI:

2 21 2

1 2 0.0251 2

x x zn nσ σ

− ± +

= 5.375 – 4.8 ± 1.96 2 23 2.5

40 35+

= 0.575 ± (1.96)(0.6353) = 0.575 ± 1.245 or (–0.67, 1.82) (b) As this CI includes the value 0 we would not eliminate the possibility

that the males and females have the same expected length of stay.

Page 141: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September2008 — Examiners’ Report

Page 7

(ii) (a) 2

1s = (1481 – 2152/40)/39 = 8.34295 2

2s = (1026 – 1682/35)/34 = 6.45882

2ps =

2 21 1 2 2

1 2

( 1) ( 1)2

n s n sn n

− + −+ −

= (39)(8.34295) (34)(6.45882)40 35 2

++ −

= 7.46541

Two sample t-test

t = 1 2

2

1 2

1 1p

x x

sn n

⎛ ⎞+⎜ ⎟

⎝ ⎠

= 5.375 4.81 17.4654140 35

⎛ ⎞+⎜ ⎟⎝ ⎠

= 0.909

t73(0.025) = 1.996 (by interpolation of 2.000 and 1.980 for 60 df and

120 df) [OR just quote the N(0,1) value 1.96 in place of the t73 value.]

Therefore there is no evidence to reject the null hypothesis that the

means for males and females do not differ at the 5% significance level, and we conclude that the mean lengths of stay are the same.

(b) 2122

ss

= 8.342956.45882

= 1.29

Comparing this to an F39,34 distribution, which has a 5% critical point between 2.075 and 1.717 (two-sided test), there is no evidence that the population variances differ. The assumption of common variance was made when conducting the test in (ii)(a), and this seems valid given the result of the test in (ii)(b).

(c) z = 1 22 21 2

1 2

x x

s sn n

+

= 5.375 4.88.34295 6.45882

40 35

+

= 0.917

Page 142: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2008 — Examiners’ Report

Page 8

Compare with N(0,1), e.g. 1.96 for 5% level test. Therefore we reach exactly the same conclusion (as in (ii)(a) but without making the

assumptions of equal variances and normal distributions – we have large samples and can rely on CLT).

12 (i) (a)

(b) SSTOT = Syy = 14345 – 3412/10 = 2716.9 Σx = 30, Σx2 = 110 so Sxx = 110 – 302/10 = 20 Sxy = 1211 − 30*341/10 = 188 ∴SSREG = 1882/20 = 1767.2 SSRES = 2716.9 − 1767.2 = 949.7 R2 = 1767.2/2716.9 = 0.650 (65.0%) (c) y = a + bx: ˆ 188 / 20 9.4b = = ˆ 341/10 9.4 (30 /10) 5.9a = − × = Fitted line is y = 5.9 + 9.4x

(d) ( )1/ 2949.7 / 8ˆ. . 2.4363

20s e b ⎛ ⎞= =⎜ ⎟

⎝ ⎠

t8(0.025) = 2.306

Page 143: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September2008 — Examiners’ Report

Page 9

95% confidence interval for b is given by 9.4 ± 2.306 × 2.4363 i.e. 9.4 ± 5.62 i.e. (3.78, 15.02) (ii) When we replace the pair of responses by their mean: the equation of the fitted line remains the same but otherwise the analyses do not produce “equivalent results” the “fit” of the line is very much better [the goodness-of-fit measure R2 increases

to a very high value − from 65% to 98.5%] plus, for example:

the estimate of the slope has a much lower standard error (2.436 drops to 0.6733) the SSTOT drops hugely (from 2716.9 on 9df to 897.2 on 4df)

the residual error (SSRES) drops hugely [from 949.7 on 8df (error variance estimate 118.7 ) to 13.60 on 3 df (error variance estimate 4.53)]

BUT we lose all information on the variation of the response for a given value of the explanatory variable

Note: these and other relevant comments will receive credit.

END OF EXAMINERS’ REPORT

Page 144: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

30 April 2009 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 13 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 A2009 © Institute of Actuaries

Page 145: ct32005-2010

CT3 A2009—2

1 A random sample of 12 claim amounts (in units of £1,000) on a general insurance portfolio is given by:

14.9 12.4 19.4 3.1 17.6 21.5 15.3 20.1 18.8 11.4 46.2 16.2 For these data: Σx = 216.9, Σx2 = 5,052.13, sample mean x = £18,075 sample median = £16,900, sample standard deviation s = £10,143. Calculate the sample mean, median, and standard deviation of the sample (of size 10)

which remains after we remove the claim amounts 3.1 and 46.2 from the original sample (you should show intermediate working and/or give justifications for your answers). [6]

2 Consider three events A, B, and C for which A and C are independent, and B and C are

mutually exclusive. You are given the probabilities P(A) = 0.3, P(B) = 0.5, P(C) = 0.2 and P(A∩B) = 0.1.

Find the probability that none of A, B, or C occurs. [3] 3 The random variable X has probability density function ( ) (1 )(1 ), 0 1f x k x x x= − + < < , where k is a positive constant. (i) Show that k = 1.5. [2] (ii) Calculate the probability P(X > 0.25). [2] [Total 4] 4 Let the random variable Y denote the size (in units of £1,000) of the loss per claim

sustained in a particular line of insurance. Suppose that Y follows a chi-square distribution with 2 degrees of freedom. Two such claims are randomly chosen and their corresponding losses are assumed to be independent of each other.

(i) Determine the mean and the variance of the total loss from the two claims. [2] (ii) Find the value of k such that there is a probability of 0.95 that the total loss

from the two claims exceeds k. [2] [Total 4] 5 The human resources department of a large insurance company currently estimates

that 82% of new employees recruited by their call centres will still be employed by the company after one year. A recent extension to the call centre business led to 280 new employees being recruited.

Calculate an approximate value for the probability that at least 240 of these new

employees will still be employed by the company after one year. [3]

Page 146: ct32005-2010

CT3 A2009—3 PLEASE TURN OVER

6 The variables X1, X2, …, X40 give the size (in units of £100) of each of 40 claims in a random sample of claims arising from damage to cars by vandals. The size of each claim is assumed to follow a gamma distribution with parameters α = 4 and λ = 0.5

and each is independent of all others. Let 40

1

140 i

iX X

=

= ∑ be the random variable

giving the mean size of such a sample. (i) State the approximate sampling distribution of X and determine its parameters. [2] (ii) Determine approximately the median of X . [1] [Total 3] 7 A survey is undertaken to investigate the frequency of motor accidents at a certain

intersection. It is assumed that, independently for each week, the number of accidents follows a

Poisson distribution with mean λ. (i) In a single week of observation two accidents occur. Determine a 95%

confidence interval for λ, using tables of “Probabilities for the Poisson distribution”. [3]

(ii) In an observation period of 30 weeks an average of 2.4 accidents is recorded.

Determine a 95% confidence interval for λ, using a normal approximation. [3] (iii) Comment on your answers in parts (i) and (ii) above. [1] [Total 7] 8 A random sample of 25 recent claim amounts in a general insurance context is taken

from a population that you may assume is normally distributed. In units of £1,000, the sample mean is 9.416x = and the sample standard deviation is s = 2.105.

Calculate a 95% one-sided upper confidence limit (that is, the upper limit k of a

confidence interval of the form (0,k)) for the standard deviation of the claim amounts in the population. [5]

Page 147: ct32005-2010

CT3 A2009—4

9 An analysis of variance investigation with samples of size eight for each of four treatments results in the following ANOVA table.

Source of variation d.f. SS MSS

Between treatments 3 6716 2239 Residual 28 3362 120 Total 31 10078 (i) Calculate the observed F statistic, specify an interval in which the resulting P-value lies, and state your conclusion clearly. [3] (ii) The four treatment means are: 1 2 3 485.0, 66.5, 59.0, and 95.5y y y y= = = = . (a) Calculate the least significant difference between pairs of means using

a 5% level. (b) List the means in order, illustrate the non-significant pairs using

suitable underlining, and comment briefly. [3] [Total 6] 10 For a group of policies the probability distribution of the total number of claims, N,

arising during a period of one year is given by P(N = 0) = 0.70, P(N = 1) = 0.15, P(N = 2) = 0.10, P(N = 3) = 0.05. Each claim amount, X (in units of £1,000), follows a gamma distribution with

parameters α = 2 and λ = 0.1 independently of each other claim amount and of the number of claims.

Calculate the expected value and the standard deviation of the total of the claim

amounts for a period of one year. [5]

Page 148: ct32005-2010

CT3 A2009—5 PLEASE TURN OVER

11 The number of claims, X, which arise in a year on each policy of a particular class is to be modelled as a Poisson random variable with mean λ. Let X = (X1, X2, …, Xn) be

a random sample from the distribution of X, and let 1

1 n

ii

X Xn =

= ∑ .

(i) (a) Use moment generating functions to show that 1

n

ii

X=∑ has a Poisson

distribution with mean nλ. (b) State, with a brief reason, whether or not the variable 2X1 + 5 has a

Poisson distribution. (c) State, with a brief reason, whether or not X has a Poisson distribution

in the case that n = 2. (d) State the approximate distribution of X in the case that n is large.

[8] An actuary is interested in the level of claims being experienced and wants in

particular to test the hypotheses H0: λ = 1 v H1: λ > 1 . He decides to use a random sample of size n = 100 and the best (most powerful)

available test. You may assume that this test rejects H0 for x k> , for some constant k.

(ii) (a) Show that the value of k for the test with level of significance 0.01 is

k = 1.2326. (b) Calculate the power of the test in part (ii)(a) in the case λ = 1.2 and

then in the case λ = 1.5. (c) Comment briefly on the values of the power of the test obtained in part

(ii)(b). [9] [Total 17]

Page 149: ct32005-2010

CT3 A2009—6

12 In a genetic plant breeding experiment a total of 1,500 plants were categorised into one of four classes (labelled A, B, C and D) with the following results:

class: A B C D frequency: 1071 62 68 299 A genetic model specifies that the probability that an individual plant belongs to each

class is given by: class: A B C D probability: 1 (2 )

4+ θ 1 (1 )

4−θ

1 (1 )4

−θ14θ

where θ is an unknown parameter such that 0 < θ < 1. (i) (a) Write down the likelihood for these data and determine the log-

likelihood. (b) Show that the maximum likelihood estimate (MLE) of θ is a solution

of the quadratic equation 2750 256 299 0θ − θ− = and hence that the MLE is given by ˆ 0.825θ = . [7] (ii) (a) Determine the second derivative of the log-likelihood and use this,

evaluated at θ = 0.825, to obtain an approximation for the Cramer-Rao lower bound for this situation.

(b) Hence calculate an approximate 95% confidence interval for θ, using

the asymptotic distribution of the MLE. [5] (iii) An extension of the genetic model suggests that the value of θ should be equal

to 0.775. (a) Carry out an appropriate χ2 test to investigate the extent to which the

current data support the extended model with this value of θ (you should calculate and comment on the P-value).

(b) Comment briefly on how this relates to your approximate confidence

interval in part (ii)(b). [6] [Total 18]

Page 150: ct32005-2010

CT3 A2009—7

13 The following table gives the scores (out of 100) that 10 students obtained on a midterm test (x) and the final examination (y) in a course in statistics.

Midterm x 65 62 50 82 80 68 88 67 90 92

Final y 44 49 54 59 66 67 71 81 89 98 For these data you are given: Sxx = 1,760.4, Syy = 2,737.6, Sxy = 1,529.8 (i) (a) Draw a scatterplot of the data and comment briefly on the relationship

between the score in the final examination and that in the midterm test. (b) The equation of the line of best fit is given by y = 3.146 + 0.869x.

Perform a suitable test involving the slope parameter β, to test the null hypothesis H0: β = 0 against H1: β > 0.

(c) Calculate a 95% confidence interval for the mean final examination

score for a midterm score of 75. (d) Consider now that we require a 95% confidence interval for an

individual predicted final examination score for a midterm score of 75. State (giving reasons) whether this interval will be narrower or wider

than the one calculated in part (i)(c) above. (You are not asked to calculate the interval.)

[13] The lecturer of this course decides to assess the linear relationship between the score

in the final examination and that in the midterm test, by using the sample correlation coefficient r.

The hypothesis H0: ρ = 0 (where ρ denotes the population correlation coefficient) can

be tested against H1: ρ > 0, by using the result that under H0 the sampling distribution

of the statistic 2

2

1

r n

r

− is the tn-2 distribution (where n is the size of the sample).

(ii) (a) Show algebraically, that is without referring to the specific data given

here, that in general the above statistic and the statistic involving β that you used in (i)(b) produce equivalent tests.

(b) Calculate the value of r for the given data and hence verify numerically

the result of part (ii)(a) above. [6] [Total 19]

END OF PAPER

Page 151: ct32005-2010

Faculty of Actuaries Institute of Actuaries

Subject CT3 — Probability and Mathematical Statistics Core Technical

EXAMINERS’ REPORT

April 2009

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. R D Muckart Chairman of the Board of Examiners June 2009

Comments The paper was answered quite well overall. Some questions were answered less well (or by noticeably fewer candidates) than others – they were:

Question 7(i) – confidence intervals based on small samples Question 9(ii) – differences between pairs of treatment means in an ANOVA context Question 11(i) – deciding whether or not certain variables derived from Poisson variables are themselves Poisson variables Question 12 – writing down the correct likelihood function, in which the stated probabilities are raised to powers given by the observed frequencies of occurrence (not multiplied by the frequencies)

There were no other misunderstandings widely evident, and no particular errors were made so repeatedly as to be worthy of comment. © Faculty of Actuaries © Institute of Actuaries

Page 152: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 2

1 Revised mean = (216.9 – 3.1 – 46.2)/10 = 16.76 i.e. £16,760 Revised median = original median = £16,900 Revised Σx2 = 5052.13 – 3.12 – 46.22 = 2908.08

so revised standard deviation = 1/ 221 167.62908.08 3.31837

9 10

⎧ ⎫⎛ ⎞⎪ ⎪− =⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭ i.e. £3,318.37

2 P(A∪B∪C) = P(A) + P(B) + P(C) – P(A∩B) – P(A∩C) – P(B∩C) + P(A∩B∩C) = 0.3 + 0.5 + 0.2 – 0.1 – (0.3 × 0.2) = 0.84 (OR via a Venn diagram) So P(none occur) = 1 – 0.84 = 0.16

3 (i) 11 1 3

2

0 0 0

1( ) 1 (1 ) 1 1 1 1 1.53 3xf x dx k x dx k x k k

⎡ ⎤ ⎛ ⎞= ⇒ − = ⇒ − = ⇒ − = ⇒ =⎢ ⎥ ⎜ ⎟⎝ ⎠⎢ ⎥⎣ ⎦

∫ ∫

(ii) P(X > 0.25) = 1

0.25( )f x dx∫

= 131 2

0.250.25

1.5(1 ) 1.5 1.5 0.422 0.633.3xx dx x

⎡ ⎤− = − = × =⎢ ⎥

⎢ ⎥⎣ ⎦∫

4 (i) E[Yi ] = 2, Var[Yi ] = 4 Therefore E[Y1 + Y2] = 4 and (since Y1, Y2 independent) Var[Y1 + Y2] = 8. So, for total loss, mean = £4,000 and variance = 8 ×106 (£2).

(OR from 21 2 4~Y Y+ χ )

(ii) For the total loss we have 21 2 4~Y Y+ χ , so we want a constant k

such that ( )kP >24χ = 0.95.

From tables of the 2

4χ distribution ( )7107.0χ 24 >P = 0.95.

∴The total losses will exceed £710.7 with probability 0.95.

Page 153: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 3

5 Let X be the number still in employment after one year. 2~ (280,0.82) (229.6, 6.43 )X bin N∴ ≈ ( 240) ( 239.5)P X P X≥ = > applying a continuity correction

239.5 229.6( ) ( 1.54) 1 0.93822 0.0626.43

P Z P Z−= > = > = − =

6 (i) E[Xi] = 4/0.5 = 8 and Var[Xi] = 4/0.52 = 16 (or by noting 2

8~iX χ ).

Using the CLT: 40

1 [ ][ ],40 40

ii ii

X Var XX N E X= ⎛ ⎞= ≈ ⎜ ⎟⎝ ⎠

∑ , i.e. N(8, 0.4)

approximately.

[Note: The exact distribution of X is Gamma(160,20)] (ii) The symmetry of the distribution gives: median X⎡ ⎤⎣ ⎦ = mean X⎡ ⎤⎣ ⎦ = 8,

i.e. £800. 7 (i) For a single observation x from Poisson(λ) a 95% confidence interval for λ is

(λ1,λ2) where

1( ; ) 0.025r x

p r∞

=

λ =∑ and 20

( ; ) 0.025x

rp r

=

λ =∑

So for x = 2

λ1 is s.t. 12

( ; ) 0.025r

p r∞

=

λ =∑ i.e. 1

10

( ; ) 0.975r

p r=

λ =∑

From tables λ1 is between 0.20 and 0.30 being about 0.24.

λ2 is s.t. 2

20

( ; ) 0.025r

p r=

λ =∑

From tables λ2 is between 7.00 and 7.25 being about 7.23. 95% confidence interval for λ is (0.24, 7.23).

Page 154: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 4

(ii) For a sample of n with observed mean x

(0,1)ˆ

X N

n

−λ≈

λ where ˆ Xλ = [OR: could use (0,1)

ˆX n N

n

Σ − λ≈

λ]

giving an approximate 95% confidence interval as 1.96 XXn

±

So for n = 30, x =2.4

95% CI is 2.42.4 1.96 2.4 0.55 (1.85, 2.95)30

± ⇒ ± ⇒

(iii) The CI in (ii) is much narrower due to having more data. (It is also centred higher due to the larger estimate.) 8 Let σ2 be the population variance.

2

2242

24 ~Sχ

σ ⇒

2

224 13.85 0.95SP

⎛ ⎞> =⎜ ⎟⎜ ⎟σ⎝ ⎠

⇒ 2

2 24 0.9513.85

SP⎛ ⎞σ < =⎜ ⎟⎜ ⎟⎝ ⎠

⇒ k2 = 24 × 2.1052 / 13.85 = 7.678

⇒ k = 2.771 so upper confidence limit for σ is 2.771 i.e. £2771

(OR: CI is (0, 2.771) i.e. (0, £2771)).

9 (i) 2239 18.66120

F = = on 3,28 d.f.

from tables F3,28(1%) = 4.568 ∴ P-value < 0.01 or 0 < P-v < 0.01 So there is overwhelming evidence of a difference between the underlying

treatment means.

Page 155: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 5

(ii) (a) LSD = 228

1 1ˆ(2.5%) ( )8 8

t σ + = 1 12.048 120( ) 11.28 8+ =

(b) means in order 3. 2. 1. 4.y y y y< < < underlined thus: Treatments 2 & 3 are separate from treatments 1 & 4 which have

significantly higher means. 10 [ ] 0.7(0) 0.15(1) 0.1(2) 0.05(3) 0.5E N = + + + = [ ]2 20.15(1) 0.1(4) 0.05(9) 1.00 1.00 0.5 0.75E N Var N⎡ ⎤ = + + = ∴ = − =⎣ ⎦

[ ] [ ] 22 220, 200

0.1 0.1E X Var X= = = =

Let S = total of the claim amounts. [ ] [ ] [ ] (0.5)(20) 10E S E N E X= = = , i.e. £10,000. [ ] [ ] [ ] [ ] 2 2[ ( )] (0.5)(200) (0.75)(20) 400V S E N Var X Var N E X= + = + = ( ) 20sd S∴ = , i.e. £20,000.

11 (i) (a) Let 1

n

ii

S X=

=∑

( ) ( ){ }exp 1tXM t e= λ −

( ) ( ){ } ( ){ } ( ){ }exp 1 exp 1nn t t

S XM t M t e n e⎡ ⎤⇒ = = λ − = λ −⎢ ⎥⎣ ⎦

⇒ S ~ Poisson(nλ) (b) No One reason is that E[2X1 + 5] = 2λ + 5, which is not equal to V[2X1 + 5] = 4λ [Note: another obvious reason is that 2X1 + 5 can only takes values 5,

7, 9, … , not 0, 1, 2, 3,… ]

Page 156: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 6

(c) No One reason is that E X⎡ ⎤ = λ⎣ ⎦ , which is not equal to / 2V X⎡ ⎤ = λ⎣ ⎦ [Note: another obvious reason is that X can take values 0.5, 1.5, 2.5,

… , which a Poisson variable cannot.]

(d) ,X Nnλ⎛ ⎞≈ λ⎜ ⎟

⎝ ⎠

(ii) (a) Under H0 , 1~ 1 ,100

X N ⎛ ⎞⎜ ⎟⎝ ⎠

approximately

k is such that ( )01| 0.01 so 2.3263

0.1kP X k H −

> = =

⇒ k = 1.2326 (b) Power(λ) = P(reject H0|λ) Power(λ = 1.2) = ( )1.2326 where ~ (1.2,0.012)P X X N> = P(Z > 0.298) = 0.383 Power(λ = 1.5) = ( )1.2326 where ~ (1.5,0.015)P X X N>

= P(Z > −2.183) = 0.985 (c) Power of test increases as the value of λ increases further away from

λ = 1.

12 (i) (a) 1071 62 68 2991 1 1 1( ) [ (2 )] [ (1 )] [ (1 )] [ ]4 4 4 4

L θ = + θ −θ −θ θ (× constant)

1071 130 299(2 ) (1 )∝ + θ −θ θ log ( ) 1071log(2 ) 130log(1 ) 299logL constθ = + + θ + −θ + θ

(b) 1071 130 299log ( )2 1

d Ld

θ = − +θ + θ −θ θ

1071 (1 ) 130 (2 ) 299(2 )(1 )(2 )(1 )

θ −θ − θ + θ + + θ −θ=

+ θ −θ θ

numerator = 2 2 21071 1071 260 130 598 299 299θ− θ − θ− θ + − θ− θ equate to zero: 2750 256 299 0θ − θ− = 1

Page 157: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 7

2256 256 4(750)( 299)ˆ 0.17067 0.654062(750)

± − −∴θ = = ±

So MLE ˆ 0.82473θ = (or 0.825 to 3dp) as other root is negative.

(ii) (a) 2

2 2 2 21071 130 299log ( )

(2 ) (1 )d Ld

θ = − − −θ + θ −θ θ

at ˆ 0.825θ = , 2

2 log ( ) 134.20 4244.90 439.30 4818.4d Ld

θ = − − − = −θ

CRlb = 2

2

1 1 0.00020754818.4

log ( )dE Ld

≈ =⎡ ⎤

− θ⎢ ⎥θ⎢ ⎥⎣ ⎦

(b) ˆ ( , )N CRlbθ ≈ θ for large samples and so an approximate 95% CI for θ is ˆ 1.96 CRlbθ± Here: 0.825 1.96 0.0002075 0.825 0.028± ⇒ ± or (0.797, 0.853)

(iii) (a) With θ = 0.775 the four probabilities are 0.69375, 0.05625, 0.05625, 0.19375 respectively and the corresponding expected frequencies are 1040.625, 84.375, 84.375, 290.625.

2

2 ( ) 0.887 5.934 3.178 0.241 10.24o ee−

∴χ = = + + + =∑ on 3 df

P-value = 2

3( 10.24) 1 0.983 0.017P χ > = − = These data do not support the model with the value θ = 0.775 in that

the probability of observing these data when θ = 0.775 is only 0.017. [OR: could say “do not support at 5% level, but do support at 1%

level”] (b) This is consistent with the fact that θ = 0.775 is well outside the

approximate 95% CI.

Page 158: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 8

13 (i) (a) The plot is given below.

50 60 70 80 90

3040

5060

7080

9010

0

Midterm score

Fina

l exa

m s

core

There seems to be a positive relationship between final and midterm

score. However it is not clear if this relationship is linear. [Following comment also valid: the relationship looks linear but with

substantial scatter.]

(b) 2

2 1ˆ2

xyyy

xx

SS

n S

⎛ ⎞⎜ ⎟σ = −⎜ ⎟− ⎝ ⎠

21 (1529.8)2737.68 1760.4⎛ ⎞

= −⎜ ⎟⎜ ⎟⎝ ⎠

= 176.0241

s.e. ˆ( )β = 2ˆ 176.0241 0.3162

1760.4xxSσ

= =

To test 0 : 0H β = v 1 : 0H β > , the test statistic is

ˆ 0 0.869 2.748ˆ 0.3162s.e.( ) β−

= =β

,

and under the assumption that the errors of the regression are

i.i.d. 2(0, )N σ random variables, it has a t distribution with n - 2 = 8 df.

Page 159: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2009 — Examiners’ Report

Page 9

From statistical tables we find 8,0.05 1.860t = , and 8,0.01 2.896t = . We reject the hypothesis H0: β = 0 in favour of H1: β > 0 at the 5%

level (but not at the 1% level). (c) The mean response, ˆnewy for ˆ 75newx = is

ˆ 3.146 0.869 75 68.321newy = + × = .

Its standard error is calculated as

2

2

( )1ˆ ˆs.e.( )

1 (75 74.4)13.26741 13.26741 0.3165510 1760.4

4.1998

newnew

xx

x xyn S

−= σ +

−= + = ×

=

The 95% CI is given by 0.025,8ˆ ˆs.e.( )new newy t y± × , i.e. 68.321 2.306 4.1998 68.321 9.6847± × = ± or (58.636, 78.006) (d) The CI for the individual predicted score will be wider than the CI for

the mean score in (i)(c), because the variance for the individual predicted value is larger.

(ii) (a) Both statistics follow a tn-2 distribution under the null hypothesis. In addition

2 2 22

( 2)ˆ 0 2

ˆ 11 1( 2)

xy xyxx

xx xx

xyxyyyyy

xx xx yyxx xx

S Sn S

S S r n

S rSSSS S Sn S S

−β− −

= = =⎛ ⎞σ −

−⎜ ⎟−⎜ ⎟− ⎝ ⎠

(b) 1529.8 0.6969.1760.4 2737.6

xy

xx yy

Sr

S S= = =

×

Then 2 2

2 0.6969 8 2.7481 1 0.6969

r n

r

− ×= =

− −, same as in (i)(b).

END OF EXAMINERS’ REPORT

Page 160: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

9 October 2009 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 12 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is not required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 S2009 © Institute of Actuaries

Page 161: ct32005-2010

CT3 S2009—2

1 In a sample of 100 households in a specific city, the following distribution of number of people per household was observed:

Number of people x 1 2 3 4 5 6 7 Number of households fx 7 f2 20 f4 18 10 5

The mean number of people per household was found to be 4.0. However, the

frequencies for two and four members per household (f2 and f4 respectively) are missing.

(i) Calculate the missing frequencies f2 and f4. [2] (ii) Find the median of these data, and hence comment on the symmetry of the

data. [2] [Total 4] 2 Two tickets are selected at random, one after the other and without replacement, from

a group of six tickets, numbered 1, 2, 3, 4, 5, and 6. (i) Calculate the probability that the numbers on the selected tickets add up to 8. [2] (ii) Calculate the probability that the numbers on the selected tickets differ by

3 or more. [2] [Total 4] 3 Let X be a random variable with moment generating function MX(t) and cumulant

generating function CX(t), and let Y = aX + b, where a and b are constants. Let Y have moment generating function MY(t) and cumulant generating function CY(t).

(i) Show that CY(t) = bt + CX(at). [2] (ii) Find the coefficient of skewness of Y in the case that MX(t) = (1 – t)–2 and

Y = 3X + 2 (you may use the fact that CY′″(0) = E[(Y − μY)3]). [5]

[Total 7]

Page 162: ct32005-2010

CT3 S2009—3 PLEASE TURN OVER

4 Let the random variables (X,Y) have the joint probability density function

, ( , ) exp{ ( )}, 0, 0.X Yf x y x y x y= − + > >

(i) Derive the marginal probability density functions of X and Y, and hence

determine (giving reasons) whether or not the two variables are independent. [3] (ii) Derive the joint cumulative distribution function , ( , ).X YF x y [2] [Total 5] 5 Let X be a random variable with probability density function given by

2( ) 2 , 0f x x x−= θ < < θ . Find an unbiased estimator of θ , based on a single observation of X. [4] 6 A random sample of size n is taken from an exponential distribution with parameter

λ, that is, with probability density function ( ) , 0xf x e x−λ= λ < < ∞ . (i) Determine the maximum likelihood estimator (MLE) of λ. [3] Claim sizes for certain policies are modelled using an exponential distribution with

parameter λ. A random sample of such claims results in the value of the MLE of λ as ˆ 0.00124λ = .

A large claim is defined as one greater than £4,000 and the claims manager is

particularly interested in p, the probability that a claim is a large claim. (ii) Determine p , the MLE of p, explaining why it is the MLE. [3] [Total 6]

Page 163: ct32005-2010

CT3 S2009—4

7 A scientific investigation involves a linear regression with the usual assumptions that the response variable y follows a normal distribution with mean α + βx and variance σ2. Twenty data points were recorded, corresponding to four observations of y at x = 1, three observations of y at x = 2, six observations of y at x = 3, and seven observations of y at x = 4. The resulting means of these sets of y observations are given in the table below.

x 1 2 3 4 no. of y's 4 3 6 7 mean of y's 18.6 21.7 23.2 27.1 (i) Determine the fitted regression line of y on x. [5] (ii) Suppose that you have been asked to provide a 95% confidence interval for

the slope coefficient. (a) Comment briefly on any problems you might encounter in the

computation of the required confidence interval. (b) Indicate briefly any further information that you would need in order

to overcome these problems. [3] [Total 8] 8 The table below shows a bivariate probability distribution for two discrete random

variables X and Y:

X = 0 X = 1 X = 2 Y = 1 0.15 0.20 0.25 Y = 2 0.05 0.15 0.20

Find the value of E[X|Y = 2]. [3] 9 In a group of motor insurance policies issued by a company, 80% of claims are made

on comprehensive policies and 20% are made on third-party-only policies. (i) Calculate the average amount paid out on a claim, given that the average

amount paid out by the company on a comprehensive policy claim is £1,650, and the average amount paid out on a third-party-only policy claim is £625.

[1] (ii) Calculate the total expected amount paid out in claims by the company in one

year, given that the total number of policies is 150,000 and, on average, the claim rate is 0.15 claims per policy per year. [2]

[Total 3]

Page 164: ct32005-2010

CT3 S2009—5 PLEASE TURN OVER

10 Consider a population in which a proportion θ of members have some specified characteristic. Let P denote the corresponding proportion of members in a random sample of size n from the population.

(i) Explain clearly why the mean and standard error of P are given by

[ ] [ ] ( )1, . .E P s e P

nθ −θ

= θ = . [3]

An insurance company uses a questionnaire to monitor the satisfaction of its

customers. In one part customers are asked to answer “yes” or “no” to a particular question. Suppose that a random sample of 200 responses is examined. (ii) Calculate the approximate probability that at least 150 “yes” answers are

found in the sample, on the assumption that the true (population) proportion of “yes” answers is 0.7. [4]

Suppose the true (population) proportion of “yes” answers (θ) is unknown, and for a

random sample of 200 responses, the number of “yes” answers is found to be 146. (iii) (a) Calculate an upper (one-sided) 95% confidence interval of the form

(0, L) for θ. (b) Calculate a lower (one-sided) 95% confidence interval of the form

(L, 1) for θ. (c) A test of the hypotheses: H0: θ = 0.7 v H1: θ > 0.7 results in a P-value of 0.198. Comment on how this result relates to the confidence interval in part

(iii)(b). [9] [Total 16]

Page 165: ct32005-2010

CT3 S2009—6

11 Three insurance company colleagues had just completed an investigation which involved the application of a two-sample t-test to compare two independent samples, each of size 11. They were concerned about the validity of the equal variance assumption required for this test. Their data were as follows.

A: 21 22 28 27 20 23 26 32 25 21 30 B: 19 18 38 33 24 39 22 20 28 26 30

2 2275, 7,033, 297, 8,559A A B Bx x x xΣ = Σ = Σ = Σ = (i) One of the colleagues suggested a graphical approach for the comparison of

the variances. Draw a suitable diagram to represent these data so that the variability of the

samples can be compared, and comment briefly on that comparison. [3] (ii) Another colleague suggested using an F-test for the comparison of variances. (a) Perform this F-test at the 5% level to compare the variances and

express your conclusion clearly. (b) In addition obtain an approximate value of the P-value for this test by

linearly interpolating between suitable entries in the tables. [6]

(iii) The third colleague suggested another procedure using a two-sample t-test in the following way:

“For each sample calculate the absolute values of the deviations of the observations from the mean of that sample; then apply a two-sample t-test to the two sets of absolute deviations.”

(a) Discuss the possible reasoning behind this suggested procedure by

considering the potential values of such absolute deviations when the assumption of equal variances is valid and when it is not valid.

(b) (1) Calculate the required sets of absolute deviations for the given data.

(2) Perform the suggested two-sample t-test at the 5% level stating

your conclusion clearly. (3) Obtain, in addition, an approximate P-value for this test by

linearly interpolating between suitable entries in the tables. [11]

(iv) Comment briefly on the conclusions that may have been reached by the three

colleagues. [2] [Total 22]

Page 166: ct32005-2010

CT3 S2009—7 PLEASE TURN OVER

12 A bank has a free telephone number for its customer services department. Often the call volume is heavy and customers are placed on hold until a staff member is available to answer. The bank hopes that a caller remains on hold until the call is answered, so as not to upset or lose an existing or potential customer.

A survey was conducted to analyse whether callers would remain on hold longer (on

average), if they heard a recorded message containing: (A) an advertisement about the bank’s products; (B) “easy listening” music; or (C) classical music. The bank randomly selected a sample of five unanswered calls under each recorded message, and the length of time (in minutes) that the caller remained on hold before hanging up is given in the table below.

Recorded message Time Total

A: advertisement 5 1 11 2 8 27 B: easy listening music 0 1 4 6 3 14 C: classical music 13 9 8 15 7 52

For these data Σy = 93, Σy2 = 865 Let , ,A B Cμ μ μ denote the mean telephone holding times under recorded message A,

B and C respectively. (i) (a) Perform an analysis of variance to test the hypothesis that the nature of

the recorded message has no effect on the length of time that callers remain on hold. You should construct an appropriate ANOVA table and state your conclusion clearly.

(b) Calculate a 95% confidence interval for A Cμ −μ , using information

available from all three samples. [11] An equivalent approach for analysing the effects of the recorded messages on holding

time is the following:

consider the regression model 1 1 2 2[ ]i i iE Y a b x b x= + + , i = 1, 2, …, 15, where Yi is the telephone holding time and 1 2,i ix x are indicator variables such that

1 1ix = if the message for caller i contains an advertisement (and 0 otherwise), and 2 1ix = if the message contains easy listening music (and 0 otherwise).

The results from fitting this model are given below:

Coef. Std. Error t-value p-value

Intercept 10.400 1.523 6.828 1.8 * 10-5 x1 −5.000 2.154 −2.321 0.039 x2 −7.600 2.154 −3.528 0.004

s = 3.406 R-sq = 51.7%

Page 167: ct32005-2010

CT3 S2009—8

(ii) Using the fitted model: (a) Calculate the predicted value for the telephone holding time when the

message contains classical music. (b) Test the hypothesis H0: b1 = 0 against H1: b1 ≠ 0 at the 5% level of

significance. (c) Derive an expression relating b1 with Aμ and Cμ , and hence verify

your result from the test in (ii)(b) using the confidence interval obtained in (i)(b).

[7] [Total 18]

END OF PAPER

Page 168: ct32005-2010

Faculty of Actuaries Institute of Actuaries

Faculty of Actuaries

Institute of Actuaries

Subject CT3 — Probability and Mathematical Statistics

Core Technical

September 2009 Examinations

EXAMINERS REPORT

Introduction

The attached subject report has been written by the Principal Examiner with the aim of

helping candidates. The questions and comments are based around Core Reading as the

interpretation of the syllabus to which the examiners are working. They have however given

credit for any alternative approach or interpretation which they consider to be reasonable.

R D Muckart

Chairman of the Board of Examiners

December 2009

Comments for individual questions are given with the solutions that follow.

Page 169: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Examiners’ Report

Page 2

Comments

The paper was answered quite well overall and there are no topics that stand out as being

particularly poorly attempted. Similarly there were no particular misunderstandings widely

evident, and no particular errors were made so repeatedly as to be worthy of comment.

1

(i) We have 2 460 100f f and 2 47 2 60 4 90 60 354

100

f f. 1

These give 2 440f f and 2 42 4 148f f

from which we obtain 2 6f and 4 34f . 1

(ii) Median is equal to the midpoint between the 50th

and 51st ordered

observations, i.e. median = 4. 1

We have mean = median, suggesting that the distribution of these data is

roughly symmetric. 1

2

(i) With sample space {(i,j), i = 1, …, 6, j = 1, …, 6, j i}

(that is, i is the number on the first ticket selected, j that on the second

selected) there are 30 equally likely outcomes.

Favourable outcomes are (2,6), (3,5), (5,3), (6,2)

so probability = 4/30 = 2/15 = 0.133 2

(ii) Favourable outcomes are

(1,4), (4,1), (1,5), (5,1), (1,6), (6,1), (2,5), (5,2), (2,6), (6,2), (3,6), (6,3)

so probability = 12/30 = 0.4 2

OR: Use a sample space of size 15: {(i,j)} where i is smaller number selected,

j is larger.

Then event (i) has 2 favourable outcomes and event (ii) has 6.

3

(i) MY(t) = E[etY] = E[et(aX+b)] = etbE[eatX] = ebtMX(at)

CY(t) = log MY(t) = bt + log MX(at) = bt + CX(at) 2

(ii) CY(t) = 2t + log(1 – 3t)–2

= 2t – 2log(1 – 3t) 1

CY (t) = 2 + 6(1 – 3t)-1 , CY (t) = 18(1 – 3t)–2 , CY (t) = 108(1 – 3t)–3 2

E[(Y μY)2] = CY (0) = 18, E[(Y μY)3] = CY (0) = 108

Page 170: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Marking Schedule

Page 3

coefficient of skewness of Y = 108/183/2 = 21/2 = 1.414 2

OR note that coefficient of skewness of Y = coefficient of skewness of X and

just work with X (some candidates may recognise X ~ Gamma(2,1) and

comment on the formula for coefficient of skewness (2/ ) given in the

Yellow Book).

4

(i) 0

0 0

( ) ( , ) x y x y xXf x f x y dy e dy e e e . 1

0

0 0

( ) ( , ) x y y x yYf y f x y dx e dx e e e [OR by symmetry]. 1

Since , ( , ) ( ) ( )x yX Y X Yf x y e f x f y , X and Y are independent. 1

(ii) ,

0 0

( , )

yxu v

X YF x y e dvdu 1

,0 0

0 0

( , ) 1 1

yxx y

u v u v x yX YF x y e du e dv e e e e 1

5

2 2

0 0[ ] ( ) 2E X xf x dx x dx 1

2 3

0

2 2

3 3x . 1

Consider 3

2Z X

3[ ] [ ]

2E Z E X .

Z is an unbiased estimator of . 2

6

(i) ( ) Π i ix xnL e e 1

log ( ) log iL n x and log ( ) i

d nL x

d 1

Equate to zero for the MLE XX

n

i

(second derivative is clearly negative, so maximum) 1

Page 171: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Examiners’ Report

Page 4

(ii) p = P(X > 4000) = exp(–4000λ) for the exponential distribution 1

ˆˆ exp( 4000 )p using the invariance property of MLE’s 1

ˆ exp( 4000(0.00124)) 0.0070p 1

7

(i) We need to calculate the basics sums ∑x, ∑x2, ∑y, ∑xy

n = 20

∑x = 4(1) + 3(2) + 6(3) + 7(4) = 56

∑x2 = 4(12) + 3(22) + 6(32) + 7(42) = 182

∑y = 4(18.6) + 3(21.7) + 6(23.2) + 7(27.1) = 468.4

∑xy = 1(4)(18.6) + 2(3)(21.7) + 3(6)(23.2) + 4(7)(27.1) = 1381.0

2

1

1381.0 (56)(468.4) 69.4820

xyS and 21182 (56) 25.2

20xxS 1

69.48ˆ 2.75725.2

1

1ˆˆ [468.4 (2.757)56] 15.720

y x 1

ˆ 15.7 2.757y x

(ii) (a) 95% CI for β is 2

0.025,18

ˆˆ

xx

tS

So we need to calculate

22 21 1ˆˆˆ ( ) ( )

2 2

xyi i yy

xx

Sy x or S

n n S. 1

Problem as we do not have the individual yi values, only means of sets

of them.

(b) We would need these individual yi values (or the s.d. or 2y for each

set). 2

8

X|Y = 2 takes values 0, 1, 2 with probabilities 1/8, 3/8, 4/8

(being in the ratios 1:3:4) 2

So E[X|Y = 2] = 1(3/8) + 2(4/8) = 11/8 = 1.375 1

Page 172: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Marking Schedule

Page 5

9

(i) E(amount) = 0.8(1650) + 0.2(625) = 1,320 + 125 = £1,445 1

(ii) E(number of claims) = 150000(0.15) = 22,500 1

E(total claim amount) = 22500(1445) = £32,512,500

10

(i) Number of sample members with the characteristic X ~ bi(n,) with mean

n and variance n (1 – ). P = X/n. 1

E[P] = n / n = 1

s.e.[P] = {V[P]}1/2 = {V[X]/n2}1/2 = {n (1 – )/n2}1/2 = { (1 – )/n}1/2

1

(ii) X ~ bi(200, 0.7) with mean 140 and variance 42 2

149.5 140

( 150) ( 1.466) 0.07142

P X P Z P Z 2

(iii) (a) Sample proportion P ~ N( (1 – )/200)

Observed P = 146/200 = 0.73

Estimated standard error(P) = (0.73×0.27/200)1/2 = 0.03139 1

1.645 0.95. . .

1.645 . . . 0.95

PPr

e s e P

Pr P e s e P 2

Upper 95% CI is given by (0, 0.73 + 1.645×0.03139)

i.e. (0, 0.782) 1

(b) By analogy with (i),

Lower 95% CI is given by (0.73 – 1.645×0.03139, 1)

i.e. (0.678, 1) 2

(c) The P–value indicates that the null hypothesis “ = 0.7” can stand

and we do not have to conclude that > 0.7. 1

The CI in (iii)(b) includes values down to 0.678, so all such values,

including 0.7, are consistent with the data when considering how

low a value of is reasonable. 1

The two results complement each other. 1

11

(i) Dotplots on same scale are most suitable

[alternatively boxplots or histograms are acceptable]

Page 173: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Examiners’ Report

Page 6

2

The spread of the B data appears to be greater than that of the A data and so

casts some doubt on the equal variance assumption. 1

(ii) (a) 2

2 1 2757033 15.8

10 11As

2

2 1 2978559 54.0

10 11Bs 1

2

2

54.03.418

15.8

B

A

sF

s on 10, 10 df 1

For a two-sided test at the 5% level, critical value is

F10,10(2.5%) = 3.717 1

So we accept 2 2

0 : A BH at the 5% level. 1

(b) F10,10(2.5%) = 3.717 and F10,10(5%) = 2.978

So P-value is between 0.05 and 0.10 1

By interpolation: P-value is

3.717 3.418

0.05 (0.10 0.05) 0.05 (0.405)(0.05) 0.0703.717 2.978

1

(iii) (a) If the samples have equal variances, then the absolute deviations will

be similar in size for both samples; if one sample has a larger variance

than the other, then the deviations will be more extreme such that the

absolute deviations will be larger for that sample.

A two-sided two-sample t-test applied to these absolute deviations

will test for a difference in the means of these absolute deviations

and hence for a difference in the variances in the original samples. 2

(b) (1) 275

2511

Ax

So the deviations for sample A , i.e. | |A A Ad x x , are

4 3 3 2 5 2 1 7 0 4 5 1

Page 174: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Marking Schedule

Page 7

297

2711

Bx

So the deviations for sample B , i.e. | |B B Bd x x are

8 9 11 6 3 12 5 7 1 1 3 1

(2) Calculations:

236, 158A Ad d and

266, 540B Bd d

36

3.27311

Ad and 2

2 1 36158 4.018

10 11dAs

66

6.00011

Bd and 2

2 1 66540 14.400

10 11dBs

1

2 10(4.018) 10(14.400)9.209 3.035

20dp dps s 1

obs. t = 3.273 6.000 2.727

2.1071.2941 1

3.03511 11

on 20 df 1

For the two-sided test at the 5% level, critical value is

t20(2.5%) = 2.086 1

So we just reject 0 : dA dBH and hence 2 2

0 : A BH

at the 5% level. 1

(3) t20(2.5%) = 2.086 and t20(1%) = 2.528

So P-value is between 0.02 and 0.05 1

By interpolation: P-value is

2.528 2.107

0.02 (0.03) 0.02 (0.952)(0.03) 0.0492.528 2.086

1

(iv) Tests in (ii) and (iii) give different results at the 5% level, but in fact have

quite similar P-values.

Graphical approach in (i) casts doubt on H0.

So all three are fairly consistent. 2

12

(i) (a) yA

= 27, yB

= 14, yC

= 52 , y = 93, y2

= 865

SST = 865 – 932/15 = 288.4

Page 175: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2009 — Examiners’ Report

Page 8

SSB = (272

+ 142

+ 522

)/5 932

/15 = 149.2 2

SSR = 288.4 149.2 = 139.2

Source of variation d.f. SS MSS

Between 2 149.2 74.6

Residual 12 139.2 11.6

Total 14 288.4

2

F = 74.6/11.6 = 6.431 on (2,12) degrees of freedom. 1

From yellow tables, F2,12

(0.05) = 3.885 and F2,12

(0.01) = 6.927. 1

We can reject the hypothesis of “no message effect” at the 5%

significance level, but not at the 1% level. We have some evidence

against the “no message effect” hypothesis and conclude that there

is a message effect. 1

(b) t12

(0.025) = 2.179 1

95% CI is

0.51 1

(5.4 10.4) 2.179 11.65 5

2

i.e. 5 4.694 or ( 9.694, 0.306) 1

(ii) (a) We have 1 2ˆ 10.4 5.0 7.6i i iy x x and for classical music

message we need 1 2 0i ix x .

This gives ˆ 10.4iy 2

(b) The P-value is 0.039. We have evidence to reject the hypothesis that

b1 = 0 at the 5% level of significance. 1

(c) For 1 21, 0i ix x we have 1A a b

and 1 2 0i ix x gives C a 1

1 A Cb 1

The 95% CI for A C in (i)(b) can be used for testing H0:

0A C and equivalently H0: b1 = 0. The interval does not include

the value 0, and thus we reject H0 at the 5% level. 2

END OF EXAMINERS’ REPORT

Page 176: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

22 April 2010 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 12 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is NOT required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 A2010 © Institute of Actuaries

Page 177: ct32005-2010

CT3 A2010—2

1 The mean height of the women in a large population is 1.671m while the mean height of the men in the population is 1.758m. The mean height of all the members of the population is 1.712m.

Calculate the percentage of the population who are women. [2] 2 Consider a group of 10 life insurance policies, seven of which are on male lives and

three of which are on female lives. Three of the 10 policies are chosen at random (one after the other, without replacement).

Find the probability that the three selected policies are all on male lives. [2] 3 Let 1 2, , , nX X X… be a random sample of size n from a population with mean μ and

variance σ2.

Let the sample mean be X and the sample variance be 2 2 21 { }1 iS X nX

n= Σ −

−.

You may assume that E X⎡ ⎤ = μ⎣ ⎦ and 2

V Xn

σ⎡ ⎤ =⎣ ⎦ .

Show that 2 2E S⎡ ⎤ = σ⎣ ⎦ . [3]

4 It is assumed that the numbers of claims arising in one year from motor insurance

policies for young male drivers and young female drivers are distributed as Poisson random variables with parameters λm and λf respectively.

Independent random samples of 120 policies for young male drivers and 80 policies for young female drivers were examined and yielded the following mean number of claims per policy in the last calendar year: 0.24mx = and 0.15fx = .

Calculate an approximate 95% confidence interval for λm − λf , the difference between the respective Poisson parameters. [3]

5 A computer routine selects one of the integers 1, 2, 3, 4, 5 at random and replicates

the process a total of 100 times. Let S denote the sum of the 100 numbers selected. Calculate the approximate probability that S assumes a value between 280 and 320

inclusive. [5]

Page 178: ct32005-2010

CT3 A2010—3 PLEASE TURN OVER

6 Let 1 2, , , nX X X… be a random sample of claim amounts which are modelled using a gamma distribution with known parameter α = 4 and unknown parameter λ.

(i) (a) Specify the distribution of 1

n

ii

X=∑ .

(b) Justify the fact that 2n Xλ has a 2

kχ distribution, where X is the mean of the sample, by using a suitable relationship between the gamma and the 2χ distribution, and specify the degrees of freedom k.

[3] A random sample of five such claim amounts yields a mean of 17.5x = . (ii) Use the pivotal method with the 2χ result from part (i)(b) to obtain a 95%

confidence interval for λ. [3] [Total 6] 7 An employment survey is carried out in order to determine the percentage, p, of

unemployed people in a certain population in a way such that the estimation has a margin of error less than 0.5% with probability at least 0.95. In a similar study conducted a year ago it was found that the percentage of unemployed people in the population was 6%.

Calculate the sample size, n, that is required to achieve this margin of error, by

constructing an appropriate confidence interval (or otherwise). [6] 8 For a sample of 100 insurance policies the following frequency distribution gives the

number of policies, f, which resulted in x claims during the last year: x: 0 1 2 3 f: 76 22 1 1

(i) Calculate the sample mean, standard deviation and coefficient of skewness for these data on the number of claims per policy.

[4] A Poisson model has been suggested as appropriate for the number of claims per

policy.

(ii) (a) State the value of the estimated parameter when a Poisson distribution is fitted to these data using the method of maximum likelihood.

(b) Verify that the coefficient of skewness for the fitted model is 1.92, and

hence comment on the shape of the frequency distribution relative to that of the corresponding fitted Poisson distribution.

[3] [Total 7]

Page 179: ct32005-2010

CT3 A2010—4

9 The number of claims, N, arising over a period of five years for a particular policy is assumed to follow a “Type 2” negative binomial distribution (as in the book of

Formulae and Tables page 9) with mean [ ] (1 )k pE Np−

= and variance

[ ] 2(1 )k pV N

p−

= .

Each claim amount, X (in units of £1,000), is assumed to follow an exponential distribution with parameter λ independently of each other claim amount and of the number of claims. Let S be the total of the claim amounts for the period of five years, in the case k = 2, p = 0.8 and λ = 2.

(i) Calculate the mean and the standard deviation of S based on the above

assumptions. [4] Now assume that:

N follows a Poisson distribution with parameter μ = 0.5, that is, with the same mean as N above; X follows a gamma distribution with parameters α = 2 and λ = 4, that is, with the same mean as X above.

(ii) Calculate the mean and the standard deviation of S based on these

assumptions. [3] (iii) Compare the two sets of answers in (i) and (ii) above. [2] [Total 9] 10 The size of claims (in units of £1,000) arising from a portfolio of house contents

insurance policies can be modelled using a random variable X with probability density function (pdf) given by:

1( ) , a

X aacf x x cx += ≥

where 0a > and 0c > are the parameters of the distribution.

(i) Show that the expected value of X is [ ]1

acE Xa

=−

, for 1a > . [2]

(ii) Verify that the cumulative distribution function of X is given by

( ) 1 , a

XcF x x cx

⎛ ⎞= − ≥⎜ ⎟⎝ ⎠

(and = 0 for x < c). [2]

Page 180: ct32005-2010

CT3 A2010—5 PLEASE TURN OVER

Suppose that for the distribution of claim sizes X it is known that c = 2.5, but a is unknown and needs to be estimated given a random sample x1, x2, …, xn.

(iii) Show that the maximum likelihood estimate (MLE) of a is given by:

1

ˆlog

2.5

ni

i

nax

=

=⎛ ⎞⎜ ⎟⎝ ⎠

∑. [3]

(iv) Derive the asymptotic variance of the MLE a , and hence determine its

approximate asymptotic distribution. [4]

Consider a sample of 30 observations from this distribution, for which:

30

1log( ) 32.9i

ix

=

=∑ .

(v) Calculate the MLE a in this case, together with an approximate 95%

confidence interval for a. [5] In the current year, claim sizes are assumed to follow the distribution of X with a = 6,

c = 2.5. Inflation for the following year is expected to be 5%. (vi) Calculate the probability that the size of a claim arising from this portfolio in

the following year will exceed £4,000. [3] [Total 19]

Page 181: ct32005-2010

CT3 A2010—6

11 Consider the following three independent random samples from a normally distributed population with unknown mean μ:

Sample 1:

19.9 20.4 20.3 22.3 16.7 18.7 20.5 19.0 20.1 16.4 21.5 21.4 17.8 22.5 15.2

For these data: n = 15, 2292.7, 5,778.69i ix x= =∑ ∑

Sample 2:

20.8 25.9 22.1 21.7 16.0 12.1 27.6 16.1 16.8 17.1 21.3 18.6 24.9 14.8 22.2

For these data: n = 15, 2298.0, 6,192.32i ix x= =∑ ∑ sample mean = 19.867, sample variance = 19.432 Sample 3:

20.6 18.5 21.5 16.9 21.5 21.2 20.9 22.4 14.5 22.0 20.2 17.0 20.3 23.0 19.3 18.9 20.6 20.9 15.3 21.5 16.8 18.5 21.6 16.8 20.4

For these data: n = 25, 2491.1, 9,773.77i ix x= =∑ ∑ sample mean = 19.644, sample variance = 5.275 Consider t-tests of the hypotheses H0: μ = 18 v H1: μ > 18.

(i) (a) Calculate the sample mean and variance for Sample 1. (b) Carry out a t-test of the stated hypotheses using the Sample 1 data

(stating the approximate P–value) and show that H0 can be rejected at the 1% level of testing.

[6] (ii) (a) Carry out a t-test of the stated hypotheses using the Sample 2 data

(stating the approximate P–value and the conclusion clearly). (b) Discuss the comparison of the results with those based on Sample 1

(include reasons for any difference or similarity in the test conclusions).

[6] (iii) (a) Carry out a t-test of the stated hypotheses using the Sample 3 data

(stating the approximate P–value and your conclusion clearly). (b) Discuss the comparison of the results with those based on Sample 1

(include reasons for any difference or similarity in the test conclusions). [6]

[Total 18]

Page 182: ct32005-2010

CT3 A2010—7 PLEASE TURN OVER

12 As part of a project in a modelling module, a statistics student is required to submit a report on the sums insured on home contents insurance policies based on samples of such policies covering risks in five medium-sized towns in each of England, Wales, and Scotland. Data are provided on the average sum insured (Y, in units of £1,000) for each of the 15 towns and are as follows:

England Wales Scotland y 11.9 11.1 9.5 9.2 13.9 5.9 9.1 8.0 5.7 8.1 9.3 9.1 7.7 8.2 10.4 For these data: y∑ = 55.6 (England), 36.8 (Wales), 44.7 (Scotland)

overall 2137.1, 1,316.63y y= =∑ ∑ The student decides to use an analysis of variance approach. (i) Suggest brief comments the student should make on the basis of the plot

below:

[2]

(ii) (a) Carry out the analysis of variance on the average sums insured. (b) Comment on your conclusions. [6]

The lecturer of the module decides to provide further information. It has been suggested that the value of a UK index of the town’s prosperity (X) might also be a useful explanatory variable (in addition to the country in which the town is situated).

The data on the index are as follows (for the towns in the same order as in the first

table): England Wales Scotland x 23 27 14 19 29 15 27 24 18 22 22 16 20 25 28 For these data: overall 2329 , 7,543 , 3,091.7x x xy= = =∑ ∑ ∑

ScotlandWalesEngland

14

13

12

11

10

9

8

7

6

5

Ave

rage

sum

insu

red

Individual and mean value plot of England, Wales, Scotland

Page 183: ct32005-2010

CT3 A2010—8

A graph of average sum insured against index (with country identified) is given below:

*

*

**

*

++

++

+

%

%

%

%

%

10 15 20 25 30

68

1012

14

Average sum insured v index

x

y

*+%

EnglandScotlandWales

The student decides to add the results of a regression approach to her report, using

“index” as an explanatory variable, so she fits the regression model

Y = a + bX + e

using the least squares criterion. Part of the output from fitting the model using a statistics package on a computer is as

follows: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.46166 2.21919 1.560 0.1428 x 0.25889 0.09896 2.616 0.0213* Residual standard error: 1.789 on 13 degrees of freedom R-Squared: 0.3449 (iii) Verify (by performing your own calculations) the following results for the

fitted model as given in the output above:

(a) the fitted regression line is y = 3.462 + 0.2589x

(b) the percentage of the variation in the response (y) explained by the model (x) is 34.5%

(c) the standard error of the slope estimate is 0.09896 [8] (iv) Comment briefly on the usefulness of “index” as a predictor for the average

sum insured. [2] (v) Suggest another model which you think might be more successful in

explaining the variability in the values of the average sum insured and provide a better predictor. [2]

[Total 20]

END OF PAPER

Page 184: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINERS’ REPORT

April 2010 Examinations

Subject CT3 — Probability and Mathematical Statistics Core Technical

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. R D Muckart Chairman of the Board of Examiners July 2010

© Faculty of Actuaries © Institute of Actuaries

Page 185: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 2

The paper was answered well overall and there is only one question that stands out as being very poorly attempted (and poorly answered by those who did attempt it), namely question 7. The question is about the precision of estimation - it involves applying the standard error of estimation of a sample proportion to find the sample size required to ensure a stated margin of error. The standard error of estimation of a sample proportion is an important and widely-used quantity. 1 Let p be the proportion of women. Then, using a weighted average, 1.671p + 1.758(1− p) = 1.712 ⇒ 0.087p = 0.046 ⇒ p = 0.529 so percentage is 52.9%

2 P(all 3 on male lives) = 7 6 5 7 0.29210 9 8 24

× × = =

[OR 7 10

/ 35 /120 7 / 243 3⎛ ⎞ ⎛ ⎞

= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

]

3 2E S⎡ ⎤⎣ ⎦ 2 21 { ( ) ( )}

1 iE X nE Xn

= Σ −−

2

2 2 21 { ( ) ( )}1

nn n

σ= Σ σ +μ − +μ

2 2 2 21 { ( ) }1

n nn

= σ +μ −σ − μ−

2 21 {( 1) }1

nn

= − σ = σ−

4 Approximate 95% CI for m fλ −λ is ( ) 1.96120 80

fmm f

xxx x− ± +

0.24 0.15(0.24 0.15) 1.96120 80

⇒ − ± +

0.09 1.96(0.062) 0.09 0.122⇒ ± ⇒ ± or ( 0.032,0.212)⇒ −

Page 186: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 3

5 S = ΣXi where Xi has a uniform distribution on 1, 2, 3, 4, 5, with mean 3 and variance (25 – 1)/12 = 2 (result known, or calculated via E[X2] = 11, or from book of formulae, p10, with a = 1, b = 5, h = 1).

So S ~ N(300, 200) approximately

(280 320)P S≤ ≤ 279.5 300 320.5 300200 200

P Z− −⎛ ⎞= < <⎜ ⎟

⎝ ⎠

( )1.450 1.450 0.853P Z= − < < = 6 (i) (a) ~iXΣ gamma(4n, λ) (b) If Y ~ gamma(α, λ) and 2α is an integer, then 2

22 ~Y αλ χ (from book of formulae, p12)

So 22 ~nXλ χ with df 8n. (ii) 2 2

40 40( (97.5) 10 (2.5)) 0.95P Xχ < λ < χ =

giving the 95% CI as 2 240 40(97.5) (2.5),10 10X X

⎛ ⎞χ χ⎜ ⎟⎜ ⎟⎝ ⎠

Data ⇒ 24.43 59.34, (0.140,0.339)10(17.5) 10(17.5)⎛ ⎞

=⎜ ⎟⎝ ⎠

Page 187: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 4

7 The 95% CI for the population percentage p is ˆ ˆ(1 )ˆ 1.96 p pp

n−

±

giving ˆ ˆ(1 )ˆ| | 1.96 p pp p

n−

− ≤

For the margin of error to be less than 0.5% we need to solve

2

2ˆ ˆ ˆ ˆ(1 ) 1.96 (1 )0.005 1.96

0.005p p p pn

n− −

= ⇒ = .

Using the percentage from the previous study as the value for p , i.e. ˆ 0.06p = , we

obtain n = 8,666.6. So we need a sample of (at least) 8667 people.

(OR, solution can be based on ( )1ˆ ~ ,

p pp N p

n⎛ ⎞−⎜ ⎟⎝ ⎠

and

( )ˆ0.005 0.005 0.95P p p− < − < > without referring to the CI.) 8 (i) 2100, 27, 35f fx fxΣ = Σ = Σ =

27 0.27100

x = =

2

2 1 27{35 } 0.2799 0.52999 100

s s= − = ∴ =

Third moment about mean is

3 3 3 33

1 {76(0 0.27) 22(1 0.27) (2 0.27) (3 0.27) } 0.3259100

m = − + − + − + − =

[OR: using 3 33

157, {57 3(0.27)(35) 2(100)(0.27) }100

fx mΣ = = − + ]

So coefficient of skewness is 3/20.3259 2.20

(0.2799)=

[OR: can use 2 0.2771m = in denominator to give 2.23 ] (ii) (a) ˆ 0.27xμ = =

Page 188: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 5

(b) Coefficient of skewness is 1 1.920.27

= (from book of formulae, p7)

so, the data distribution is slightly more positively skewed than the

fitted Poisson.

9 (i) [ ] (1 ) 2(0.2) 0.50.8

k pE Np−

= = = and [ ] 2 2(1 ) 2(0.2) 0.625

0.8k pV N

p−

= = =

[ ] 1 1 0.52

E X = = =λ

and [ ] 2 21 1 0.25

2V X = = =

λ

[ ] [ ] [ ] 0.5 0.5 0.25E S E N E X= = × = , i.e. £250

[ ] [ ] [ ] [ ] [ ]{ }2 20.5 0.25 0.625 0.5 0.28125V S E N V X V N E X= + = × + × = [ ] 0.530SD S∴ = , i.e. £530

(ii) [ ] [ ] 0.5E N V N= = μ =

[ ] 2 0.54

E X α= = =λ

and [ ] 2 22 0.125

4V X α

= = =λ

[ ] [ ] [ ] 0.5 0.5 0.25E S E N E X= = × = , i.e. £250

[ ] [ ] [ ] [ ] [ ]{ }2 20.5 0.125 0.5 0.5 0.1875V S E N V X V N E X= + = × + × = [ ] 0.433SD S∴ = , i.e. £433 (iii) As expected the means are the same, but the standard deviation in (i) is larger than that in (ii) due to the fact that

both N and X have larger variances.

Page 189: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 6

10 (i) We have:

[ ] ( 1)1( )

1

a aa a a

X a cc c c

ac acE X xf x dx x dx ac x dx xax

∞ ∞ ∞ ∞− − −+

⎡ ⎤= = = = − ⎣ ⎦−∫ ∫ ∫

and for a > 1

[ ] 1(0 )1 1

aaac acE X c

a a− += − − =

− −.

(ii) 1( ) ( )x x a

X X ac c

acF x f t dt dtt += =∫ ∫

which gives

( ) ( ) 1 , axa a a a a

X c

cF x c t c x c x cx

− − − ⎛ ⎞⎡ ⎤= − = − − = − ≥⎜ ⎟⎣ ⎦ ⎝ ⎠

[OR differentiate ( )XF x to obtain ( )Xf x ] (iii) The likelihood function is given by:

( 1)1

1 1 1( ) ( )

n n naan na

X i iaii i i

acL a f x a c xx

− ++

= = =

= = =∏ ∏ ∏

and

1

( ) log( ) log( ) ( 1) log( )n

ii

l a n a na c a x=

= + − + ∑

For the MLE:

1

( ) 0 log( ) log( ) 0n

ii

nl a n c xa =

′ = ⇒ + − =∑

1 1

ˆlog( ) log( ) log

n ni

ii i

n naxx n cc= =

⇒ = =⎛ ⎞− ⎜ ⎟⎝ ⎠

∑ ∑,

and for c = 2.5,

1

ˆlog

2.5

ni

i

nax

=

=⎛ ⎞⎜ ⎟⎝ ⎠

Page 190: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 7

(iv) For the asymptotic variance we use the Cramer-Rao lower bound:

2( ) nl aa

′′ = − , and ( ) 2nE l a

a′′⎡ ⎤ = −⎣ ⎦

giving

[ ] ( ){ }21

ˆ aV a E l an

−′′⎡ ⎤= − =⎣ ⎦ .

Hence, asymptotically, 2ˆ ~ ( , )a N a a n . (v) MLE is

1 1

30ˆ 5.54432.9 30 log(2.5)

log log( ) log( )n n

ii

i i

n nax x n cc= =

= = = =− ×⎛ ⎞ −⎜ ⎟

⎝ ⎠∑ ∑

.

Using the asymptotic normal distribution given above, an approximate 95% CI

is given by

2 ˆˆ ˆ1.96 1.96a aa a

n n± = ±

i.e. 5.5445.544 1.9630

± , giving (3.560, 7.528).

(vi) Size of claim in the following year will be given by 1.05X

So we want 4 4(1.05 4) 11.05 1.05XP X P X F⎛ ⎞ ⎛ ⎞> = > = −⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

and using FX given in the question

61.05 2.5(1.05 4) 0.0799

4P X ×⎛ ⎞> = =⎜ ⎟

⎝ ⎠.

Page 191: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 8

11 (i) (a) 2

2 1 292.719.513, 5778.69 4.795514 15

x s⎛ ⎞

= = − =⎜ ⎟⎜ ⎟⎝ ⎠

(b) Test statistic is 12~

/n

X tS n

−−μ

Here t = (19.513 – 18)/(4.7955/15)1/2 = 2.68 P-value = P(t14 > 2.68), which is just less than 0.01 (1%) We reject H0 and accept “μ > 18” at the 1% level of testing. (ii) (a) Here t = (19.867 – 18)/(19.432/15)1/2 = 1.64 P-value = P(t14 > 1.64), which is between 0.05 and 0.1. P-value exceeds 5% and so we cannot reject H0, so “μ = 18” can stand. (b) Sample 2 does not provide enough evidence to justify rejecting H0,

despite having the same size and a similar mean to Sample 1. The reason for the loss of significance is the much greater variation in

the data in Sample 2 – the variance is four times bigger than in Sample 1 (19.432 v 4.7955)

– this greatly increases the standard error of estimation and reduces the

value of the t-statistic (1.64 v 2.68). (iii) (a) Here t = (19.644 – 18)/(5.275/25)1/2 = 3.58 P-value = P(t24 > 3.58), which is less than 0.001 (0.1%) We reject H0 and accept “μ > 18” at a level lower than 0.1%. (b) Sample 3 provides even stronger evidence against H0, despite having a

similar mean and variance to Sample 1. The main reason for the much greater level of significance is the

increased sample size (25 v 15) – this decreases the standard error of estimation and increases the value

of the t-statistic considerably (3.58 v 2.68).

Page 192: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 9

12 (i) • the three sets of points are positioned at different levels (the means are

shown), so there is a prima facie case for suggesting that the underlying means are different (i.e. there are country effects)

• the means are in the order England (highest), Scotland, Wales (lowest)

• the variation in the data for Scotland is perhaps lower than that for

England, but with only 5 observations for each country, we cannot be sure that there is a real underlying difference in variance

(ii) (a) SST = 1316.63 – 137.12/15 = 63.536, SSB = (55.62 + 36.82 + 44.72) / 5 – 137.12/15 = 35.644 ∴ SSR = 63.536 – 35.644 = 27.892

Source of variation Df SS MSS Between countries 2 35.644 17.82 Residual 12 27.892 2.324 Total 14

Under H0: no country effects F = 17.82/2.324 = 7.67 on (2,12) df

P-value of F = 7.67 is less than 0.01, so we reject H0 and conclude that

there are differences among the population means of the average sum insured

(b) We have strong evidence that country effects exist − the means appear

to be in the order England (highest), Scotland, Wales (lowest). (iii) (a) Sxx = 7543 – 3292/15 = 326.9333, Syy = 63.536 (from (i)(b) above) Sxy = 3091.7 – 329×137.1/15 = 84.64 ˆ 84.64 / 326.9333 0.25889β = = , ( )ˆˆ 137.1/15 329 /15 3.4617α = −β× = So fitted line is y = 3.462 + 0.2589x (b) R2 = Sxy

2/(SxxSyy) = 84.642/(326.9333×63.536) = 0.34488 so 34.5% (c) SSRES = Syy – Sxy

2/Sxx = 63.536 – 84.642/326.9333 = 41.62349 ⇒ 2ˆ 41.62349 /13 3.201807σ = = ⇒ ( ) ( )1/2ˆ. . 3.201807 / 326.9333 0.09896s e β = =

Page 193: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — April 2010 — Examiners’ Report

Page 10

(iv) From the plot we see that the relationship between “index” and “average sum insured” is weak, positive (and possibly linear) – the percentage of the variation in “average sum insured” explained by the relationship with “index” is only 34.5%.

So “index” is of some, but limited, use as a predictor of “average sum

insured”. (v) We should try a “multiple regression” model which includes “country” and

“index” in the model. [Note: although not explicitly in the syllabus, a comment to the effect that

“Country” should be included as a qualitative variable (a “factor”) e.g. by using a text vector (with entries “E”, “W”, “S” say) or a pair of (Bernoulli) dummy variables, may attract a bonus for a borderline candidate.]

END OF EXAMINERS’ REPORT

Page 194: ct32005-2010

Faculty of Actuaries Institute of Actuaries

EXAMINATION

4 October 2010 (am)

Subject CT3 — Probability and Mathematical Statistics Core Technical

Time allowed: Three hours

INSTRUCTIONS TO THE CANDIDATE

1. Enter all the candidate and examination details as requested on the front of your answer booklet.

2. You must not start writing your answers in the booklet until instructed to do so by the

supervisor. 3. Mark allocations are shown in brackets. 4. Attempt all 12 questions, beginning your answer to each question on a separate sheet. 5. Candidates should show calculations where this is appropriate.

Graph paper is required for this paper.

AT THE END OF THE EXAMINATION

Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this question paper.

In addition to this paper you should have available the 2002 edition of the Formulae and Tables and your own electronic calculator from the approved list.

© Faculty of Actuaries CT3 S2010 © Institute of Actuaries

Page 195: ct32005-2010

CT3 S2010—2

1 The marks of a sample of 25 students from a large class in a recent test have sample mean 57.2 and standard deviation 7.3. The marks are subsequently adjusted: each mark is multiplied by 1.1 and the result is then increased by 8.

Calculate the sample mean and standard deviation of the adjusted marks. [2] 2 In a survey, a sample of 10 policies is selected from the records of an insurance

company. The following data give, in ascending order, the time (in days) from the start date of the policy until a claim has arisen from each of the policies in the sample.

297 301 312 317 355 379 404 419 432+ 463+

Some of the policies have not yet resulted in any claims at the time of the survey, so

the times until they each give rise to a claim are said to be censored. These values are represented with a plus sign in the above data.

(i) Calculate the median of this sample. [2] (ii) State what you can conclude about the mean time until claims arise from the

policies in this sample. [2] [Total 4] 3 Suppose that in a group of insurance policies (which are independent as regards

occurrence of claims), 20% of the policies have incurred claims during the last year. An auditor is examining the policies in the group one by one in random order until two policies with claims are found.

(i) Determine the probability that exactly five policies have to be examined until

two policies with claims are found. [2] (ii) Find the expected number of policies that have to be examined until two

policies with claims are found. [1] [Total 3] 4 For a certain class of business, claim amounts are independent of one another and are

distributed about a mean of μ = £4,000 and with standard deviation σ = £500. Calculate an approximate value for the probability that the sum of 100 such claim

amounts is less than £407,500. [3] 5 A random sample of 200 travel insurance policies contains 29 on which the

policyholders made claims in their most recent year of cover. Calculate a 99% confidence interval for the proportion of policyholders who make

claims in a given year of cover. [3]

Page 196: ct32005-2010

CT3 S2010—3 PLEASE TURN OVER

6 The random variable X has a Poisson distribution with mean Y, where Y itself is considered to be a random variable. The distribution of Y is lognormal with parameters μ and σ2.

Derive the unconditional mean E[X] and variance V[X] using appropriate conditional

moments. (You may use any standard results without proof, including results from the book of Formulae and Tables.) [4]

7 Let X be a discrete random variable with the following probability distribution:

X: 0 1 2 3 P(X = x): 0.4 0.3 0.2 0.1

(i) Simulate three observations of X using the following three random numbers

from a uniform distribution on (0,1) (you should explain your method briefly and clearly).

Random numbers: 0.4936, 0.7269, 0.1652 [3]

Let X be a random variable with cumulative distribution function:

( ) ( )2

11 1 , 0 1

1x

XF x e xe

−−= − < <

− (FX(x) = 0 for x ≤ 0 and FX(x) = 1 for x ≥ 1).

(ii) Derive an expression for a simulated value of X using a random number u

from a uniform distribution on (0,1) and hence simulate an observation of X using the random number u = 0.8149. [3]

[Total 6] 8 A certain type of claim amount (in units of £1,000) is modelled as an exponential

random variable with parameter λ = 1.25. An analyst is interested in S, the total of 10 such independent claim amounts. In particular he wishes to calculate the probability that S exceeds £10,000.

(i) (a) Show, using moment generating functions, that: (1) S has a gamma distribution, and (2) 2.5S has a 2

20χ distribution. (b) Use tables to calculate the required probability.

[5] (ii) (a) Specify an approximate normal distribution for S by applying the

central limit theorem, and use this to calculate an approximate value for the required probability.

(b) Comment briefly on the use of this approximation and on the result. [3] [Total 8]

Page 197: ct32005-2010

CT3 S2010—4

9 Let the random variable X have the Poisson distribution with probability function:

( ) , 0,1,2,...!

xef x xx

−λλ= =

(i) Show that ( 1) ( ), 0,1, 2,...1

P X k P X k kk

λ= + = = =

+ [2]

It is believed that the distribution of the number of claims which arise on insurance

policies of a certain class is Poisson. A random sample of 1,000 policies is taken from all the policies in this class which have been in force throughout the past year. The table below gives the number of claims per policy in this sample.

No. of claims, k: 0 1 2 3 4 5 6 7 8 or more

No. of policies, fk: 310 365 202 88 26 6 2 1 0 For these data the maximum likelihood estimate (MLE) of the Poisson parameter λ is

ˆ 1.186.λ = (ii) Calculate the frequencies expected under the Poisson model with parameter

given by the MLE above, using the recurrence formula of part (i) (or otherwise). [3]

(iii) Perform an appropriate statistical test to investigate the assumption that the

numbers of claims arising from this particular class of policies follow a Poisson distribution. [5]

[Total 10]

10 In the collection of questionnaire data, randomised response sampling is a method which is used to obtain answers to sensitive questions. For example a company is interested in estimating the proportion, p, of its employees who falsely take days off sick. Employees are unlikely to answer a direct question truthfully and so the company uses the following approach.

Each employee selected in the survey is given a fair six-sided die and asked to throw

it. If it comes up as a 5 or 6, then the employee answers yes or no to the question “have you falsely taken any days off sick during the last year?”. If it comes up as a 1, 2, 3 or 4, then the employee is instructed to toss a coin and answer yes or no to the question “did you obtain heads?”. So an individual’s answer is either yes or no, but it is not known which question the individual has answered.

For the purpose of the following analysis you should assume that each employee

answers the question truthfully.

(i) Show that the probability that an individual answers yes is 1 ( 1)3

p + . [2]

Suppose that 100 employees are surveyed and that this results in 56 yes answers.

Page 198: ct32005-2010

CT3 S2010—5 PLEASE TURN OVER

(ii) (a) Show that the likelihood function L(p) can be expressed in the form: 56 44( ) ( 1) (2 )L p p p∝ + − . (b) Hence show that the maximum likelihood estimate (MLE) of p is

ˆ 0.68p = . [5]

Let 1 ( 1)3

pθ = + and note that, using binomial results, the MLE of θ is 56ˆ100

θ = .

(iii) Explain why p can be obtained as the solution of 1 ˆˆ( 1)3

p + = θ , and hence

verify that ˆ 0.68p = . [2] (iv) (a) Determine the second derivative of the log likelihood for p and

evaluate it at ˆ 0.68p = . (b) State an approximate large-sample distribution for the MLE p . (c) Hence calculate approximate 95% confidence limits for p. [5] Now suppose that the same numerical estimate, that is ˆ 0.68p = , had been obtained

from a sample of the same size, that is 100, without using the randomised response method but relying on truthful answers. So the number of yes answers was 68 and

68ˆ100

p = using binomial results.

(v) (a) Calculate approximate 95% confidence limits for p for this situation. (b) Suggest why the confidence limits in part (iv)(c) are wider than these

limits. [3] [Total 17]

Page 199: ct32005-2010

CT3 S2010—6

11 A life insurance company issuing critical illness insurance wants to compare the delay times from the date when a claim is made until it is settled, for different causes of illness covered. Random samples of 12 claims associated with two types of illness (A and B) related to heart disease have been collected. The logarithms of the delay times are given below (where the original times were measured in days):

Cause A, yA: 4.0 5.4 4.6 3.5 4.2 4.5 4.2 4.9 5.1 5.2 5.1 5.4 Cause B, yB: 5.7 5.6 4.2 5.1 4.4 5.9 5.4 3.9 5.7 4.5 4.8 3.9

For these data: 2 256.1, 266.33, 59.1, 297.03A A B By y y y= = = =∑ ∑ ∑ ∑ (i) Use a suitable t-test to investigate the hypothesis that the mean delay time is

the same for claims related to the two causes of illness and state clearly your conclusion. [6]

(ii) Give a possible reason why the logarithms of the original delay time

observations are used in this analysis. [2] (iii) (a) Calculate an equal-tailed 95% confidence interval for 2 2

A Bσ σ , the ratio of the variances of the delay times for the two causes of illness.

(b) Comment on the validity of the test in part (i) based on this confidence

interval. [4]

The company collects a third sample of 12 claims associated with an illness (C)

related to brain disease, and the logarithms of the delay times are given below:

Cause C, yc: 5.6 6.2 6.0 5.6 7.1 5.0 4.5 6.4 4.6 6.0 5.5 5.3 For these data: 267.8, 389.28C Cy y= =∑ ∑ For data in all three samples: 2183.0, 952.64y y= =∑∑ ∑∑ (iv) Use analysis of variance to test the hypothesis that the mean delay times are

the same for all three causes of illness. [6] (v) State the assumptions made for this analysis of variance. [2]

Page 200: ct32005-2010

CT3 S2010—7 PLEASE TURN OVER

(vi) Comment briefly on the validity of the test in (iv), using the plot of the residuals of the analysis given below. [2]

Residuals

Cau

se

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

AB

C

[Total 22]

Page 201: ct32005-2010

CT3 S2010—8

12 An investigation concerning the improvement in the average performance of female track athletes relative to male track athletes was conducted using data from various international athletics meetings over a period of 16 years in the 1950s and 1960s. For each year and each selected track distance the observation y was recorded as the average of the ratios of the twenty best male times to the corresponding twenty best female times.

The data for the 100 metres event are given below together with some summaries.

year t: 1 2 3 4 5 6 7 8 ratio y: 0.882 0.879 0.876 0.888 0.890 0.882 0.885 0.886

year t: 9 10 11 12 13 14 15 16

ratio y: 0.885 0.887 0.882 0.893 0.878 0.889 0.888 0.890

2 2136, 1496, 14.160, 12.531946, 120.518t t y y tyΣ = Σ = Σ = Σ = Σ =

(i) Draw a scatterplot of these data and comment briefly on any relationship between ratio and year. [3]

(ii) Verify that the equation of the least squares fitted regression line of ratio on

year is given by: y = 0.88105 + 0.000465t. [4] (iii) (a) Calculate the standard error of the estimated slope coefficient in part

(ii). (b) Determine whether the null hypothesis of “no linear relationship”

would be accepted or rejected at the 5% level. (c) Calculate a 95% confidence interval for the underlying slope

coefficient for the linear model. [5] Corresponding data for the 200 metres event resulted in an estimated slope coefficient

of: ˆ 0.000487β = with standard error 0.000220. (iv) (a) Determine whether the “no linear relationship” hypothesis would be

accepted or rejected at the 5% level. (b) Calculate a 95% confidence interval for the underlying slope

coefficient for the linear model and comment on whether or not the underlying slope coefficients for the two events, 100m and 200m, can be regarded as being equal.

(c) Discuss why the results of the tests in parts (iii)(b) and (iv)(a) seem to

contradict the conclusion in part (iv)(b). [6] [Total 18]

END OF PAPER

Page 202: ct32005-2010

INSTITUTE AND FACULTY OF ACTUARIES

EXAMINERS’ REPORT

September 2010 examinations

Subject CT3 — Probability and Mathematical Statistics Core Technical

Introduction The attached subject report has been written by the Principal Examiner with the aim of helping candidates. The questions and comments are based around Core Reading as the interpretation of the syllabus to which the examiners are working. They have however given credit for any alternative approach or interpretation which they consider to be reasonable. T J Birse Chairman of the Board of Examiners December 2010

© Institute and Faculty of Actuaries

Page 203: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 2

Comments The paper was answered well and overall performance was satisfactory. However, some questions were poorly attempted. A number of candidates could not answer Question 1 correctly and efficiently – the question required basic knowledge of data summaries. Question 8 was not answered very well – answers to questions that require candidates to “show” a particular statement, need to demonstrate intermediate steps clearly and accurately. The same applies to Question 10, where deriving specific results regarding maximum likelihood estimation was not performed accurately by many candidates. 1 Sample mean = (1.1 × 57.2) + 8 = 70.92 Sample standard deviation = 1.1 × 7.3 = 8.03 2 (i) Sample median is not affected by the fact that the last two observations are

censored. It is therefore given by the 5.5th ranked observation, i.e. (355 379) / 2 367+ =

days. (ii) We know that the last two observations have minimum values 432 and 463. Using these two values the sample mean would be equal to 3679/10 = 367.9. So, the sample mean is at least equal to 367.9 days. 3 (i) Using the negative binomial distribution, or from first principles,

P(5 policies required) = 2 35 1(0.2) (0.8) 0.0819

2 1−⎛ ⎞

=⎜ ⎟−⎝ ⎠

(ii) Expected number = mean of negative binomial distribution = 2 100.2

=

4 Working in units of £1,000, sum of 100 claim amounts S has E[S] = 100×4 = 400 and V[S] = 100 × 0.52 = 25, and so S ~ N(400, 52) approximately. P(S < 407.5) = P(Z < 1.5) = 0.933 5 Sample proportion = 29/200 = 0.145

99% CI is given by 0.145 0.8550.145 2.5758200×

± i.e. 0.145 ± 0.064

i.e. (0.081, 0.209).

Page 204: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 3

6 [ ] [ ]2 /2[ ( | )]E X E E X Y E Y eμ+σ= = =

[ ] [ ] [ ]2 2 2/2 2[ ( | )] [ ( | )] ( 1)V X E V X Y V E X Y E Y V Y e e eμ+σ μ+σ σ= + = + = + −

7 (i) Method 0 < u ≤ 0.4 ⇒ x = 0 0.4 < u ≤ 0.7 ⇒ x = 1 0.7 < u ≤ 0.9 ⇒ x = 2 0.9 < u ≤ 1 ⇒ x = 3 We get x = 1, 2, 0

(ii) Setting ( )2

11 1

1xu e

e−

−= −−

⇒ ( )2 11 1xe e u− −= − −

⇒ ( )1/2

1log 1 1x e u−⎡ ⎤⎡ ⎤= − − −⎢ ⎥⎣ ⎦⎣ ⎦

u = 0.8149 ⇒ x = 0.851 8 (i) (a) (1) Let Xi be a claim amount.

Mgf of Xi is 1

( ) 11.25X

tM t−

⎛ ⎞= −⎜ ⎟⎝ ⎠

Mgf of 10

1i

iS X

=

=∑ is 10

10( ) [ ( )] 11.25S X

tM t M t−

⎛ ⎞= = −⎜ ⎟⎝ ⎠

,

which is the mgf of a gamma(10, 1.25) variable.

(2) Mgf of 2.5S is (2.5 ) (2.5 ) 10[ ] [ ] (2.5 ) (1 2 )t S t SSE e E e M t t −= = = − ,

which is the mgf of a gamma(10, ½) variable, i.e. 220χ .

(b) 2

20(total £10,000) ( 10) ( 25)P P S P> = > = χ > 1 0.7986 0.2014= − =

(ii) (a) S has mean 10 81.25

= and variance 210 6.4

1.25= . So (8,6.4)S N≈

10 8( 10) ( 0.791) 1 0.786 0.2146.4

P S P Z −> ≅ > = = − =

Page 205: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 4

(b) n is not particularly large for the use of the CLT, but the approximation is still quite close to the true probability.

9 (i) ( )

1( 1) ( )

1 ! ! 1 1

k kP X k e e P X k

k k k k

+−λ −λλ λ λ λ

= + = = = =+ + +

.

(ii) Using 1.186( 0)P X e−= = , P(X = 8 or more) = 7

01 ( )

iP X i

=

− =∑ , and the

recurrent formula, we obtain:

K 0 1 2 3 4 5 6 7 8 or more P(X = k) 0.3054 0.3623 0.2148 0.0849 0.0252 0.0060 0.0012 0.0002 4 × 10−5

Expected, ek 305.4 362.3 214.8 84.9 25.2 6.0 1.2 0.2 0.0 (iii) Combining the last 4 categories to obtain expected frequencies greater than 5,

we have:

k 0 1 2 3 4 5 or more No. of policies, fk 310 365 202 88 26 9

Expected, ek 305.4 362.3 214.8 84.9 25.2 7.4 This gives

( )22 k k

k

f ee−

χ =∑

0.0693 0.0201 0.7628 0.1132 0.0254 0.3459 1.3367= + + + + + = DF = 6 − 1 − 1 = 4, and from statistical tables, 2

0.05,4 9.488χ = .

Therefore, we do not have evidence against the hypothesis that the number of claims comes from a Poisson(1.186) distribution.

(Alternatively if we only combine the last 3 categories, the expected

frequencies for 5 and 6 or more policies are 6 and 1.4, with observed frequencies 6 and 3 respectively. These give 2 2.819χ = on 5 DF, and with

20.05,5 11.071χ = the conclusion is the same as before.)

Page 206: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 5

10 (i) (yes) (5,6) (yes | main question) (1,2,3,4) (yes | coin question)P P P P P= +

= 2 4 1 1 ( 1)6 6 2 3

p p+ ⋅ = +

(ii) (a) 56 100 561 1( ) [ ( 1)] [1 ( 1)]3 3

L p p p −∝ + − +

56 44( 1) (2 )p p∝ + − (b) log 56log( 1) 44log(2 ) .L p p const= + + − +

56 44log1 2

d Ldp p p

= −+ −

56(2 ) 44( 1) 68 100( 1)(2 ) ( 1)(2 )

p p pp p p p− − + −

= =+ − + −

Equate to zero => 68ˆ68 100 0 0.68100

p p− = ∴ = =

(iii) Due to the invariance property of MLEs 1 ˆˆ( 1)3

p + = θ

1 56 168 68ˆ ˆ( 1) 1 0.683 100 100 100

p p∴ + = ∴ = − = =

(iv) (a) 2

2 2 256 44log

( 1) (2 )d Ldp p p

= − −+ −

at ˆ 0.68p = , 2

2 2 256 44log 45.0938

1.68 1.32d Ldp

= − − = −

(b) 1 0.0221845.0938

CRlb −= =−

and ˆ ( ,0.02218)p N p≈

(c) Approximate 95% CI for p is ˆ 1.96 0.02218p ± giving 0.68 0.292±

(v) (a) Approximate 95% CI for p is ˆ ˆ(1 )ˆ 1.96

100p pp −

±

giving 0.68(1 0.68)0.68 1.96 0.68 1.96(0.0466) 0.68 0.091

100−

± ⇒ ± ⇒ ±

Page 207: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 6

(b) Less accurate estimation is the penalty paid for using the randomised response method.

11 (i) We want to test H0: A Bμ = μ against H1: A Bμ ≠ μ . Data give: 56.1/12 4.675Ay = = , 59.1/12 4.925By = = 2 2(266.33 56.1 /12) /11 0.36932As = − = ,

2 2(297.03 59.1 /12) /11 0.54205Bs = − = Assuming that the two samples come from normal distributions with the same

variance,

we first compute the pooled variance as2 2

2 11 11 0.45568522

A Bp

s ss += =

which gives 0.9072 /12

A B

p

y yts

−= = − .

Critical values at 5% level are t22(0.025) = −2.074 and t22(0.975) = 2.074 so we don’t have evidence against H0 and conclude that the mean delay time is

the same for claims associated with the two causes of illness. (ii) Distribution of times can be skewed to the right, and we need a log

transformation to normalise the data (for test to be valid).

(iii) (a) CI is given by 2 2

2 211,11

11,11

/ , ( / )* (0.025)(0.025)

A BA B

s s s s FF

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

11,11(0.025) 3.478F = (using interpolation in the tables) giving CI as (0.68134/3.478, 0.68134*3.478) = (0.196, 2.370). (b) The value “1” is included in the 95% CI, meaning that the assumption

of common variance made for the test is valid.

(iv) SST = 952.64 – 1832/36 = 22.39 SSB = (56.12 + 59.12 + 67.82 )/12 − 1832/36 = 6.155 ⇒ SSR = 22.39 − 6.155 = 16.235

Source of variation d.f. SS MSS

Between 2 6.155 3.078 Residual 33 16.235 0.492 Total 35 22.390

Page 208: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 7

F = 3.078/0.492 = 6.256 on (2,33) degrees of freedom.

From tables, F2,33(0.05) is between 3.276 and 3.295, and F2,33(0.01) is between 5.289 and 5.336.

We have strong evidence against the null hypothesis and conclude that the

three mean delay times are not equal. (v) The assumptions are that the data come from normal populations with

constant variance. (vi) The plot suggests that the normality assumption is reasonable and that

variance does not depend on cause. Test seems valid. 12 (i) Scatterplot with suitable axes and clearly labelled:

There does not appear to be much of a relationship, perhaps a slight increasing linear relationship but it is weak with quite a bit of scatter.

(ii) n = 16

21361496 340

16ttS = − =

214.16012.531946 0.000346

16yyS = − =

(136)(14.160)120.518 0.15816tyS = − =

Page 209: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 8

0.158ˆ 0.0004647340

ty

tt

SS

β = = =

14.160 136ˆˆ (0.0004647) 0.8810516 16

y tα = −β = − =

Fitted line is y = 0.88105 + 0.000465t

(iii) (a) s.e. ( β ) = 2ˆ

ttSσ where

22 1ˆ ( )

2ty

yytt

SS

n Sσ = −

2

2 1 0.158ˆ (0.000346 ) 0.000019514 340

σ = − =

0.0000195ˆ. .( ) 0.000239340

s e∴ β = =

(b) Null hypothesis of “no linear relationship” is equivalent to H0: β = 0

We use 14ˆ

~ˆ. .( )t t

s eβ

under H0: β = 0

Observed 0.000465 1.950.000239

t = = and 0.025,14 2.145t =

So we must accept H0: no linear relationship at the 5% level. (c) 95% CI is 0.000465 2.145 0.000239± × giving 0.000465 0.000513± or (−0.000048, 0.000978)

(iv) (a) Observed 0.000487 2.210.000220

t = = – this is greater than 0.025,14 2.145t =

So we reject H0: no linear relationship at the 5% level. (b) 95% CI is 0.000487 2.145 0.000220± × giving 0.000487 0.000472± or (0.000015, 0.000959) The two CIs overlap substantially, so there is no evidence to suggest

that the slopes are different.

Page 210: ct32005-2010

Subject CT3 (Probability and Mathematical Statistics Core Technical) — September 2010 — Examiners’ Report

Page 9

(c) Although the tests have different conclusions at the 5% level, the 100m observed t is only just inside the critical value of 2.145 and the 200m one is just outside. This in fact agrees with, rather than contradicts, the conclusion that the slopes are not different.

END OF EXAMINERS’ REPORT