Gompertz Maximum Likelihood Estimation of Truncated Death … · 2020. 3. 25. · main ="2000 Simulated Gompertz Deaths") 20 40 60 80 4 15 28 41 54 67 80 2000 Simulated Gompertz Deaths

Post on 22-Mar-2021

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Gompertz Maximum Likelihood Estimation ofTruncated Death Distribuitons

Joshua R. Goldstein

November 8, 2019

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 1 / 32

Big Picture

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 2 / 32

Our challenge

We see only part of the picture (e.g., deaths aged 70 to 87).

No estimates of who died before or after

How can we estimate death rates without denominators?How can we estimate e(65) differences between groups?

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 3 / 32

Our idea

We can combine

observed distribution of deaths (over limited range)our external knowledge of human mortality age-patterns

The hope is that this will produce good estimates of mortality rates, ofe(65), and of differences between groups

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 4 / 32

Today’s agenda

Intro to Maximum Likelihood EstimationExample with simulated dataAttempt at validation with HMDPreliminary try at NUMIDENTLessions and directions

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 5 / 32

Truncated Maximum Likelihood (in theory)

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 6 / 32

Philosophy

For given data X , we can a likelihood associated with a particular value ofparameter θ.

We then choose the θ̂ to maximize this likelihood.

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 7 / 32

A simple example

Likelihood for observation i withvalue xi :

Li = L(λ|xi) = fλ(xi)

Likelihood for all observations:

L =∏

iLi

Log-likelihood:

L =∑

ilog Li

If we observe x1 = 3 and x2 = 5, then

L(λ|x1) = λe−3λ

L(λ|x2|) = λe−5λ

L(λ|x1, x2) = λ2e( − 8λ)

L =∑

ilog Li = 2 log λ− 8λ

dLdλ = 2

λ− 8 = 0

So,λ̂MLE = 2/8 = 0.25

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 8 / 32

We did this by hand, but can also do with the computerlambda.vec <- seq(.01, 1, .01)loglik.vec = 2 * log(lambda.vec) - 8 * lambda.vecplot(lambda.vec, loglik.vec)abline(v = lambda.vec[which.max(loglik.vec)])

0.0 0.2 0.4 0.6 0.8 1.0

−9

−8

−7

−6

−5

lambda.vec

logl

ik.v

ec

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 9 / 32

For truncated distribution we observe only from a to b

We can define the conditional distribution

ftrunc = fθ(x)∫ ba fθ(x) dx

= fθ(x)Fθ(b)− Fθ(a)

with likelihood

L(θ|x) =∏ fθ(xi)

Fθ(b)− Fθ(a)

And then we maximize that.

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 10 / 32

Validation with Simulated Gompertz

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 11 / 32

Simulated Gompertz, without truncationsource("hmd_validation_functions.R")N = 2000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))plot(names(Dx), Dx, type = "h",

ylab = "Death Counts", xlab = "Age",main = "2000 Simulated Gompertz Deaths")

20 40 60 80

415

2841

5467

80

2000 Simulated Gompertz Deaths

Age

Dea

th C

ount

s

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 12 / 32

MLE estimation

source("hmd_validation_functions.R")N = 1000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,

b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))

## [1] 0.09572687

(M.hat = exp(fit$par[2]))

## [1] 75.01612

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 13 / 32

How did we do?

0 20 40 60 80 100

415

2534

4352

2000 Simulated Gompertz Deaths, with MLE fit

Age

Dea

th C

ount

s

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 14 / 32

Now artificially truncate to ages 65-90

l <- 65h <- 90x.trunc <- x[x > l & x < h]Dx <- table(floor(x.trunc))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = l, x.right = h,

b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))

## [1] 0.1083535

(M.hat = exp(fit$par[2]))

## [1] 75.42279

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 15 / 32

Plot the fit

0 20 40 60 80 100

010

2030

4050

Age

Dea

ths

Simulated Truncated Gompertz, with MLE fit

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 16 / 32

Other ingredients

A modelHere we pick Gompertz, with parameters β and M.Some code to implement optimization routineValidatationTest on simulated Gompertz deaths, to see if we estimate right valuesTest on HMD to see if it works with real dataApplicationNUMIDENT(Weighted Censoc)

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 17 / 32

Major Assumptions

Gompertz model is appropriateUniform coverage across ages to preserve the cohort distribution ofdeaths

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 18 / 32

CodeGompertz functions

source("hmd_validation_functions.R")

##fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,

b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))

## [1] 0.1716812

(M.hat = exp(fit$par[2]))

## [1] 78.73153

Optimization

## wrapper function that calls optim()counts.trunc.gomp.est <- function(Dx, x.left, x.right, b.start,M.start){...}

## our negative log-lilelihoodd.counts.gomp.negLL <- function(par, Dx, x.left, x.right){...}

## usagefit <- counts.trunc.gomp.est(Dx = my.Dx, x.left = 70, x.right = 87,b.start = 1/9, M.start = 80)

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 19 / 32

Without Truncation

N = 1000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,

b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))

## [1] 0.09572687

(M.hat = exp(fit$par[2]))

## [1] 75.01612

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 20 / 32

Plot the fit

20 40 60 80

415

2225

2831

3437

4043

46

names(Dx)

Dx

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 21 / 32

Validating Method with HMD

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 22 / 32

Our approach

Use HMD death counts for the year and ages we have NUMIDENTdeaths

(1988 to 2005, Ages 65 and over)

Fit Gompertz and if we matchMortality ratese(65)

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 23 / 32

Import and prepare HMD datalibrary(data.table)## read in and define age and cohortdt.mx <- fread("~/Documents/hmd/hmd_statistics/death_rates/Mx_1x1/USA.Mx_1x1.txt")dt.mx[ , x := as.numeric(Age)]

## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion

dt.mx[Age == "110+" , x := 110]dt.mx[, cohort := Year - x]dt <- fread("~/Documents/hmd/hmd_statistics/deaths/Deaths_1x1/USA.Deaths_1x1.txt")dt[ , x := as.numeric(Age)]

## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion

dt[Age == "110+" , x := 110]dt[, cohort := Year - x]#### make array, age x cohort x sex.dt.long <- melt(dt, measure.vars = c("Male", "Female"), variable.name = "sex", value.name = "Dx")## make arraymy.array <- dt.long[cohort %in% my.cohorts, xtabs(Dx ~ x + cohort + sex)]dimnames(my.array)[[3]] <- c("m", "f") ## for backwards compatibility

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 24 / 32

Fit HMD## [1] 1900## [1] 1901## [1] 1902## [1] 1903## [1] 1904## [1] 1905## [1] 1906## [1] 1907## [1] 1908## [1] 1909## [1] 1910## [1] 1911## [1] 1912## [1] 1913## [1] 1914## [1] 1915## [1] 1916## [1] 1917## [1] 1918## [1] 1919## [1] 1920## [1] 1921## [1] 1922## [1] 1923

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 25 / 32

Plot parameter values

1900 1905 1910 1915 1920

0.05

0.07

0.09

0.11

my.cohorts

b.ve

c

1900 1905 1910 1915 1920

7585

95

my.cohorts

M.v

ec

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 26 / 32

Look at cohort of 1920, pretending we observe only ages68 to 85

65 70 75 80 85 90 95 100

010

000

2000

030

000

4000

050

000

names(Dx)

Dx

HMD females cohort of 1920, observed and fit

Very high M is not crazy: Period life table for 2016 has mode at age 88.Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 27 / 32

e(65)

1900 1905 1910 1915 1920

1012

1416

1820

Male cohort e(65) estimates from truncated Gompertz MLE

cohort

e(65

)

For reference, period e(65, 2016) = 18.36.Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 28 / 32

Mortality rates

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 29 / 32

Selected Mortality Rates

1895 1900 1905 1910 1915 1920 1925

0.05

0.10

0.20

MLE fits (dashed) vs HMD observations (solid) Males

cohort

Mac

(lo

g−sc

ale)

68

70

72

74

76

78

80

82

84

86

88

90

92

94

96

98

Overall fit is remarkably good.But shouldn’t rely on for unobserved ages (see old age decline in upperright)Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 30 / 32

Period perspective

1990 1995 2000 2005

0.05

0.10

0.20

Year

Mxt

(lo

g−sc

ale)

MLE estimated hazards (dashed) vs HMD (solid), period perspective by age

6870

72

74

76

78

80

82

84

86

88

90

92

94

9698

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 31 / 32

Trying out the method with NUMIDENT

Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 32 / 32

top related