Gompertz Maximum Likelihood Estimation of Truncated Death Distribuitons Joshua R. Goldstein November 8, 2019 Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death Distribuitons November 8, 2019 1 / 32
Gompertz Maximum Likelihood Estimation ofTruncated Death Distribuitons
Joshua R. Goldstein
November 8, 2019
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 1 / 32
Big Picture
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 2 / 32
Our challenge
We see only part of the picture (e.g., deaths aged 70 to 87).
No estimates of who died before or after
How can we estimate death rates without denominators?How can we estimate e(65) differences between groups?
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 3 / 32
Our idea
We can combine
observed distribution of deaths (over limited range)our external knowledge of human mortality age-patterns
The hope is that this will produce good estimates of mortality rates, ofe(65), and of differences between groups
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 4 / 32
Today’s agenda
Intro to Maximum Likelihood EstimationExample with simulated dataAttempt at validation with HMDPreliminary try at NUMIDENTLessions and directions
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 5 / 32
Truncated Maximum Likelihood (in theory)
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 6 / 32
Philosophy
For given data X , we can a likelihood associated with a particular value ofparameter θ.
We then choose the θ̂ to maximize this likelihood.
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 7 / 32
A simple example
Likelihood for observation i withvalue xi :
Li = L(λ|xi) = fλ(xi)
Likelihood for all observations:
L =∏
iLi
Log-likelihood:
L =∑
ilog Li
If we observe x1 = 3 and x2 = 5, then
L(λ|x1) = λe−3λ
L(λ|x2|) = λe−5λ
L(λ|x1, x2) = λ2e( − 8λ)
L =∑
ilog Li = 2 log λ− 8λ
dLdλ = 2
λ− 8 = 0
So,λ̂MLE = 2/8 = 0.25
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 8 / 32
We did this by hand, but can also do with the computerlambda.vec <- seq(.01, 1, .01)loglik.vec = 2 * log(lambda.vec) - 8 * lambda.vecplot(lambda.vec, loglik.vec)abline(v = lambda.vec[which.max(loglik.vec)])
0.0 0.2 0.4 0.6 0.8 1.0
−9
−8
−7
−6
−5
lambda.vec
logl
ik.v
ec
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 9 / 32
For truncated distribution we observe only from a to b
We can define the conditional distribution
ftrunc = fθ(x)∫ ba fθ(x) dx
= fθ(x)Fθ(b)− Fθ(a)
with likelihood
L(θ|x) =∏ fθ(xi)
Fθ(b)− Fθ(a)
And then we maximize that.
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 10 / 32
Validation with Simulated Gompertz
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 11 / 32
Simulated Gompertz, without truncationsource("hmd_validation_functions.R")N = 2000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))plot(names(Dx), Dx, type = "h",
ylab = "Death Counts", xlab = "Age",main = "2000 Simulated Gompertz Deaths")
20 40 60 80
415
2841
5467
80
2000 Simulated Gompertz Deaths
Age
Dea
th C
ount
s
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 12 / 32
MLE estimation
source("hmd_validation_functions.R")N = 1000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,
b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))
## [1] 0.09572687
(M.hat = exp(fit$par[2]))
## [1] 75.01612
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 13 / 32
How did we do?
0 20 40 60 80 100
415
2534
4352
2000 Simulated Gompertz Deaths, with MLE fit
Age
Dea
th C
ount
s
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 14 / 32
Now artificially truncate to ages 65-90
l <- 65h <- 90x.trunc <- x[x > l & x < h]Dx <- table(floor(x.trunc))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = l, x.right = h,
b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))
## [1] 0.1083535
(M.hat = exp(fit$par[2]))
## [1] 75.42279
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 15 / 32
Plot the fit
0 20 40 60 80 100
010
2030
4050
Age
Dea
ths
Simulated Truncated Gompertz, with MLE fit
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 16 / 32
Other ingredients
A modelHere we pick Gompertz, with parameters β and M.Some code to implement optimization routineValidatationTest on simulated Gompertz deaths, to see if we estimate right valuesTest on HMD to see if it works with real dataApplicationNUMIDENT(Weighted Censoc)
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 17 / 32
Major Assumptions
Gompertz model is appropriateUniform coverage across ages to preserve the cohort distribution ofdeaths
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 18 / 32
CodeGompertz functions
source("hmd_validation_functions.R")
##fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,
b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))
## [1] 0.1716812
(M.hat = exp(fit$par[2]))
## [1] 78.73153
Optimization
## wrapper function that calls optim()counts.trunc.gomp.est <- function(Dx, x.left, x.right, b.start,M.start){...}
## our negative log-lilelihoodd.counts.gomp.negLL <- function(par, Dx, x.left, x.right){...}
## usagefit <- counts.trunc.gomp.est(Dx = my.Dx, x.left = 70, x.right = 87,b.start = 1/9, M.start = 80)
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 19 / 32
Without Truncation
N = 1000set.seed(13)x <- rgompertz.M(N, b = 1/10, M = 75)Dx <- table(floor(x))fit <- counts.trunc.gomp.est(Dx= Dx, x.left = 0, x.right = 200,
b.start = 1/9, M.start = 80)(b.hat = exp(fit$par[1]))
## [1] 0.09572687
(M.hat = exp(fit$par[2]))
## [1] 75.01612
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 20 / 32
Plot the fit
20 40 60 80
415
2225
2831
3437
4043
46
names(Dx)
Dx
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 21 / 32
Validating Method with HMD
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 22 / 32
Our approach
Use HMD death counts for the year and ages we have NUMIDENTdeaths
(1988 to 2005, Ages 65 and over)
Fit Gompertz and if we matchMortality ratese(65)
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 23 / 32
Import and prepare HMD datalibrary(data.table)## read in and define age and cohortdt.mx <- fread("~/Documents/hmd/hmd_statistics/death_rates/Mx_1x1/USA.Mx_1x1.txt")dt.mx[ , x := as.numeric(Age)]
## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion
dt.mx[Age == "110+" , x := 110]dt.mx[, cohort := Year - x]dt <- fread("~/Documents/hmd/hmd_statistics/deaths/Deaths_1x1/USA.Deaths_1x1.txt")dt[ , x := as.numeric(Age)]
## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion
dt[Age == "110+" , x := 110]dt[, cohort := Year - x]#### make array, age x cohort x sex.dt.long <- melt(dt, measure.vars = c("Male", "Female"), variable.name = "sex", value.name = "Dx")## make arraymy.array <- dt.long[cohort %in% my.cohorts, xtabs(Dx ~ x + cohort + sex)]dimnames(my.array)[[3]] <- c("m", "f") ## for backwards compatibility
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 24 / 32
Fit HMD## [1] 1900## [1] 1901## [1] 1902## [1] 1903## [1] 1904## [1] 1905## [1] 1906## [1] 1907## [1] 1908## [1] 1909## [1] 1910## [1] 1911## [1] 1912## [1] 1913## [1] 1914## [1] 1915## [1] 1916## [1] 1917## [1] 1918## [1] 1919## [1] 1920## [1] 1921## [1] 1922## [1] 1923
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 25 / 32
Plot parameter values
1900 1905 1910 1915 1920
0.05
0.07
0.09
0.11
my.cohorts
b.ve
c
1900 1905 1910 1915 1920
7585
95
my.cohorts
M.v
ec
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 26 / 32
Look at cohort of 1920, pretending we observe only ages68 to 85
65 70 75 80 85 90 95 100
010
000
2000
030
000
4000
050
000
names(Dx)
Dx
HMD females cohort of 1920, observed and fit
Very high M is not crazy: Period life table for 2016 has mode at age 88.Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 27 / 32
e(65)
1900 1905 1910 1915 1920
1012
1416
1820
Male cohort e(65) estimates from truncated Gompertz MLE
cohort
e(65
)
For reference, period e(65, 2016) = 18.36.Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 28 / 32
Mortality rates
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 29 / 32
Selected Mortality Rates
1895 1900 1905 1910 1915 1920 1925
0.05
0.10
0.20
MLE fits (dashed) vs HMD observations (solid) Males
cohort
Mac
(lo
g−sc
ale)
68
70
72
74
76
78
80
82
84
86
88
90
92
94
96
98
Overall fit is remarkably good.But shouldn’t rely on for unobserved ages (see old age decline in upperright)Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 30 / 32
Period perspective
1990 1995 2000 2005
0.05
0.10
0.20
Year
Mxt
(lo
g−sc
ale)
MLE estimated hazards (dashed) vs HMD (solid), period perspective by age
6870
72
74
76
78
80
82
84
86
88
90
92
94
9698
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 31 / 32
Trying out the method with NUMIDENT
Joshua R. Goldstein Gompertz Maximum Likelihood Estimation of Truncated Death DistribuitonsNovember 8, 2019 32 / 32