Gompertz overhauled: Adaptation of a 200 years old distribution to the modern style of dying
If he could have looked 200 years into the future, Benjamin Gompertz (1779-1865) would surely have been pleased to see how well his "law of mortality" holds up under conditions vastly different from those where they were formulated. In particular, how it nowadays outperforms the more modern and versatile Weibull distribution, Gompertz is FG(x) = 1 - exp(-b/a (eax-1))
vs Weibull
FW(x) = 1 - exp(-(x/a)k)
. I imagine Gompertz, having applied his standard set of tools, could have been spot on in his criticism of the Weibull as a mortality law:
"Look here. All the mortality data I have looked into approximately follow a geometric progression of hazard rates, hn = h0αn
[1]
But the Weibull has something in between an arithmetic and a geometric progression. Counting from age x0 we get: hn = h0(1+n/x0)k-1
. In general, that is a worse fit to the data, and the lower h0 and the higher n are, the less adept it gets when it comes to capturing the more challenging features of human mortality."
But low h0 and high n are the most important characteristics of contemporary mortality in developed countries. We can illustrate this by applying maximum likelihood estimation (MLE) of Weibull, Gompertz, Gompertz-Makeham and "slightly generalized Gompertz" to what model-wise may be one of the most challenging mortality datasets: The 2017 table for Norwegian women. There are quite a few "generalized Gompertz" distributions available, but I haven't seen one labeled "slightly generalized Gompertz", so I use that for the distribution FG3(x) = 1 - exp(-b/a (eaxk-1))
.
Some purists may frown upon this use of "synthetic" data, but I think that criticism overshoots a bit here: The main bulk of deaths occur between ages 75 and 95, and the mortality changes over short time spans are not big enough to make the table approach invalid for model evaluation and estimation.
MLE fitting using optimx and nlminb
The good'ol nlm function of base R doesn't quite cut it here, and I employed the optimx package [2], where the nlminb function proved to be the best choice for this task. I tried to help it by following the book and furnishing gradient and hessian functions, but it didn't quite work out, and I deemed the basic results obtained good enough for the present purpose. Beware that with several parameters, quite different combinations may give quite similar end results. If that is considered a bug, it is a bug of the model.
I think fig 1 quite clearly brings out the strengths and weaknesses of the distributions employed. First, we may say that "re-enacted" Gompertz' principally based criticism of the Weibull distribution is completely vindicated: Weibull isn't able to adapt to the sharply focused contemporary mortality, and strongly overestimates deaths from 60 to 80, in addition to furnishing us with a small, non-existent army of centenarians. Standard Gompertz is far better, but fails to capture the 70-90's deviations from linear log-hazard rate mentioned in [1].
Gompertz-Makeham (GM) and "slightly generalized Gompertz" (FG3) both add a parameter: GM for including a constant component in the hazard rate,
log(h(x)) = log(b eax + λ)
. FG3 models an often occurring kind of non-linearity in the log-hazard rate:
log(h(x)) = log(b) + log(k) + axk + (k-1)log(x)
.
While they approximate marginally better than plain Gompertz, they both fail to capture the deviations from linearity adequately. They overestimate mortality ages 70-80, underestimate it around 90, and don't fully get the sharp peak around 90.
As a mathematician, Gompertz probably appreciated the intricacies involved in focused mortality patterns, and I'm not sure he would rate the 2018 performance of his method so badly. But we can be quite sure that as an expert actuarian, he wouldn't feel content with 15-20% error in predicting deaths in an important age group. So he would surely agree that his method needed an overhaul, while probably insisting the the overhaul be local. For the overall performance of his method is still quite good.
There is clearly not just one way to do this overhaul. But FG3 has told us that the "generalization" approach may not be worthwhile. Maybe we need something "special" rather than "general"? The good performance of standard Gompertz at ages 95-100 may give us some hints: Don't introduce something that deviates a lot from the log-hazard line there.
Overhaul requirements
The overhauled method must be adaptive, so we need parameters. They should come in addition to (a,b)
- As few additional parameters as possible, we accept some hard-coding in the result
- Localized: Effect of modification minimal outside of age span ca 70-95
- Perturbation with strong damping, exponential may be used for that
- The deviations to model suggest an anti-symmetric perturbation, like x3+x
- Localization may be done by translation x->x-h
- We would like perturbation function l(x) to have a continous derivative
- We would like to have
log(h(x)) ≈ log(b) + a(x+l(x)) + δ(x)
with δ(x) "small"
The requirements indicate that we could probably make good use of at least 3 parameters (c,d,h): Perturbation could be dimensioned by c, its exponential dampening regulated by d and the localization (around 85-90 years of age) byh. Giving us something like:
l(x) = c((x-h)3 + A(x-h))/(1+exp(d |x-h|))
The parameter A will be tuned "manually", and then hard-coded into the model.
Maxima: Visiting a friend of R
With the requirements, and a basic idea for solution, it may be time to turn to one of R's good, but maybe underrated and underexploited friends, the computer algebra system (CAS) of ancient Lisp lineage, Maxima[3]. I use it with the wxmaxima user interface. It's not that Maxima is so indispensable as a tool, but it is very good for structured approaches and quality control.
Defining l(x) in Maxima, we can try it out with different values of the parameters, we can differentiate (albeit it may be best to carry out that manually, because of the potential discontinuity of l'(x) at x=h. And more importantly, we can get assistance with the pdf and the log-likelihood. We can see that it does in fact have a continous derivative at x=h, but with a small cusp. The higher order discontinuities have no impact on our mortality modeling.
As an example, here is the "raw" MathML [4] output from wxmaxima, giving the log-likelihood for
FGO(x) = 1 - exp(-a/b (ea(x+l(x))-1))
.
Saved as a function, it can be edited and used in the R code. Note that Maxima's straight use of
d/dx |x| = x/|x|
is not a good way to implement that functionality in R.
Running optimx with different initial conditions will invariably lead to different parameter estimates, many of which may meet the convergence criteria, and this may feel a bit frustrating. Furthermore, maximizing the log-likelihood carefully may not return the best parameter set for your purposes, some limited deviation from maximality may be warranted. But these issues must be considered necessary consequences of the parametrization.
One reason the more plain Newton-based routine nlm is less useful in such situations, is that the parameters may get very close to 0. But several functions may break on negative input, so we may need to limit the search - something nlminb and a few other easily available routines allow for.
Fig 2 shows that overhauling Gompertz by adding 3 parameters for modeling local non-linearity in the log-hazard function may indeed be enough to obtain good fits. Most of the overestimation at 60-80 is gone, and the mortality peak at 90 is well captured. The "costs" in this concrete case are somewhat larger deviations at the oldest ages, and in addition, small irregularities are "neglected". Different sets of parameters yield slightly different results in this respect.
Similar results are obtained by adding the l(x) modification to Gompertz-Makeham, but I don't show them here. With more parameters, we also potentially get more indetermination and more job choosing between essentially equivalent parameter sets. And as we can see from the plots, at the ages 25-50, where the fits are really improved by the Makeham parameter λ the absolute deviation between estimated and observed deaths is rather small.
The full specification of the overhauled Gompertz approximation for 2017 Norwegian female mortality thus becomes:
FGO(x) = 1 - exp(-0.115/3.9e-6 (e0.115(x+l(x))-1))
.
l(x) = 0.04((x-92.5)3 + 6(x-92.5))/(1+exp(0.35 abs(x-92.5)))
Fig 3. Norwegian male (blue) and female (red) mortality 1967-2017 approximated by overhauled Gompertz. Basic Gompertz shown for reference. |
A 50 years perspective
Fig 3 illustrates how the overhauled version also can be applied to older mortality data, but 50 years ago, the fit of the basic Gompertz distribution might be deemed adequate both for women and men. The estimated value of the c parameter was much smaller 50 years ago, but the 3 extra parameters might contribute to better fit also then. Therefore, it is not necessarily very efficient simply to set c=0 for older data. For older data, there may also be more to be gained by including the Makeham λ parameter. It should be noted, though, that Gompertz-Makeham in no way can account for the largest difference between today and 50 years ago: Infant mortality, which has decreased 6-8-fold. In 1967, about 2% of newborns died before 11, today, 2% of newborns are not expected to have died at 45, and about 97.5% are expected to reach 50.
The peak in deaths has become higher and more narrow over time, and in particular for women, less regular. About 36% of all female deaths now occur between 85 and 92, and fig 3 indicates a tendency for the men's mortality curves to resemble that of women 10-20 years earlier, just a little bit lower and translated a few years to the left. The coefficient plots shown in [1] may be interpreted as indication of a degree of convergence of male and female mortality, but what kind of evidence is extrapolating based on? In Norway, cardiovascular disease has over the years become a less prominent cause of death, while cancer mortality has increased. But what might the impact of that be when about 50% of male deaths and 62% of female deaths occur at age 85 or older?
One might try to assess the development in mortality by looking at plots of the coefficients of overhauled Gompertz, but the optimization procedure used here is not stable enough to produce estimates where the parameters, without further processing or quality control, can be organized in time series in any meaningful way. That is also a result of the higher degree of randomness introduced when irregularities are modeled more precisely. This also applies to derived indicators like the mode, which with overhauled Gompertz does not have the same predictable relation to the whole distribution as it has with basic Gompertz. The parameters (a,b) of the basic linear log-hazard should in principle be displayed in the fitted model, but the 5 parameters (a,b,c,d,h) are dependent, and different combinations may produce quite similar model fits.
The basic R machinery in use
When defining new distributions, functions for the pdf f(x,para) and the negative of the log-likelihood minusLogLikf(para) is in many cases enough. For simplicity, I don't try to supply data to the likelihood function as parameter(s), when it is to be used on several data sets I rather embed it in a wrapper function that takes the data as parameter(s) and prepares it for the optimization procedures. In this case, data is provided as a frequency table, with x holding the case values and dx the number of cases (sum(dx)=100000).
When parameter constraints are provided for optimx (here lower=c(0,0,0,0,0)), several entries in the method list may be excluded. I tend to keep a long list intact, as it may happen that the constraints can be dropped when we have zoomed in on the correct parameter neighborhoods.
fGompAugm5p <- function(x,para) {
a <- para[1] ; b <- para[2] ; c <- para[3] ; d <- para[4] ; h <- para[5] ;
xepsi <- 2 ; l <- 1 ;
diffact <- ifelse(x < h,-1,1) ;
hxfunc <-c*((x-h)^3 + 3*xepsi*(x-h))/(l + exp(d*abs(x-h))) ;
ldxfunc <- 1 + 3*c*((x-h)^2 + 1*xepsi)/(l + exp(d*abs(x-h))) + -d*c*diffact*((x-h)^3
+ 3*xepsi*(x-h))*exp(d*abs(x-h))/(l + exp(d*abs(x-h)))^2 ;
b*ldxfunc*exp(a*(x + hxfunc))*exp(-b/a*exp(a*(x+hxfunc)) + b/a)
}
minusLogLikGompAugm5p <- function(para) {
a <- para[1] ; b <- para[2] ; c <- para[3] ; d <- para[4] ; h <- para[5] ;
n <- sum(dx) ; xepsi <- 2 ; l <- 1 ;
hxfunc <-c*((x-h)^3 + 3*xepsi*(x-h)) /(l+ exp(d*abs(x-h)))
diffact <- ifelse(x < h,-1,1)
ldxfunc <- 1 + 3*l*c*((x-h)^2 + 1*xepsi)/ (l + exp(d*abs(x-h))) -d*c*diffact*((x-h)^3
+ 3*xepsi*(x-h))*exp(d*abs(x-h))/(l + exp(d*abs(x-h)))^2 ;
logdxsum <- sum(dx*log(ldxfunc)) ;
xsum <- sum(dx*(x + hxfunc)) ; expsum <- sum(dx*exp(a*(x + hxfunc))) ;
- ( n*log(b) + n*b/a + logdxsum + a*xsum - b/a*expsum )
}
# M2017 - dx vector contains mortality data for males 2017
p5 <- c(0.115,3.9e-06,0.04,0.35,92.5);
g5x<-optimx(p5,fn=minusLogLikGompAugm5p,
method=c("Nelder-Mead","BFGS","nlm","L-BFGS-B","nlminb"),lower=c(0,0,0,0,0)) ;
References
1. "Gompertz rules" posting on this blog 201804022. Interview with John Nash about optimx
3. Background on Maxima
4. MathML introduction