Lee–Carter model

From HandWiki
Short description: Numerical algorithm for mortality forecasting

The Lee–Carter model is a numerical algorithm used in mortality forecasting and life expectancy forecasting.[1] The input to the model is a matrix of age specific mortality rates ordered monotonically by time, usually with ages in columns and years in rows. The output is a forecasted matrix of mortality rates in the same format as the input.

The model uses singular value decomposition (SVD) to find:

  • A univariate time series vector [math]\displaystyle{ \mathbf{k}_t }[/math] that captures 80–90% of the mortality trend (here the subscript [math]\displaystyle{ t }[/math] refers to time),
  • A vector [math]\displaystyle{ \mathbf{b}_x }[/math] that describes the relative mortality at each age (here the subscript [math]\displaystyle{ x }[/math] refers to age), and
  • A scaling constant (referred to here as [math]\displaystyle{ s_1 }[/math] but unnamed in the literature).

Surprisingly, [math]\displaystyle{ \mathbf{k}_t }[/math] is usually linear, implying that gains to life expectancy are fairly constant year after year in most populations. Prior to computing SVD, age specific mortality rates are first transformed into [math]\displaystyle{ \mathbf{A}_{x,t} }[/math], by taking their logarithms, and then centering them by subtracting their age-specific means over time. The age-specific mean over time is denoted by [math]\displaystyle{ \mathbf{a}_x }[/math]. The subscript [math]\displaystyle{ x,t }[/math] refers to the fact that [math]\displaystyle{ \mathbf{A}_{x,t} }[/math] spans both age and time.

Many researchers adjust the [math]\displaystyle{ \mathbf{k}_t }[/math] vector by fitting it to empirical life expectancies for each year, using the [math]\displaystyle{ \mathbf{a}_x }[/math] and [math]\displaystyle{ \mathbf{b}_x }[/math] generated with SVD. When adjusted using this approach, changes to [math]\displaystyle{ \mathbf{k}_t }[/math] are usually small.

To forecast mortality, [math]\displaystyle{ \mathbf{k}_t }[/math] (either adjusted or not) is projected into [math]\displaystyle{ n }[/math] future years using an ARIMA model. The corresponding forecasted [math]\displaystyle{ \mathbf{A}_{x,t+n} }[/math] is recovered by multiplying [math]\displaystyle{ \mathbf{k}_{t+n} }[/math] by [math]\displaystyle{ \mathbf{b}_x }[/math] and the first diagonal element of S (when [math]\displaystyle{ \mathbf{U} \mathbf{S} \mathbf{V^*} = \text{svd}(\mathbf{A}_{x,t}) }[/math]). The actual mortality rates are recovered by taking exponentials of this vector.

Because of the linearity of [math]\displaystyle{ \mathbf{k}_t }[/math], it is generally modeled as a random walk with trend. Life expectancy and other life table measures can be calculated from this forecasted matrix after adding back the means and taking exponentials to yield regular mortality rates.

In most implementations, confidence intervals for the forecasts are generated by simulating multiple mortality forecasts using Monte Carlo Methods. A band of mortality between 5% and 95% percentiles of the simulated results is considered to be a valid forecast. These simulations are done by extending [math]\displaystyle{ \mathbf{k}_t }[/math] into the future using randomization based on the standard error of [math]\displaystyle{ \mathbf{k}_t }[/math] derived from the input data.

Algorithm

The algorithm seeks to find the least squares solution to the equation:

[math]\displaystyle{ \ln{(\mathbf{m}_{x,t})} = \mathbf{a}_x + \mathbf{k}_t \mathbf{b}_x + \epsilon_{x,t} }[/math]

where [math]\displaystyle{ \mathbf{m}_{x,t} }[/math] is a matrix of mortality rate for each age [math]\displaystyle{ x }[/math] in each year [math]\displaystyle{ t }[/math].

  1. Compute [math]\displaystyle{ \mathbf{a}_x }[/math] which is the average over time of [math]\displaystyle{ \ln{(\mathbf{m}_{x,t})} }[/math] for each age:
    [math]\displaystyle{ \mathbf{a}_x = \frac{\sum_{t=1}^{T}{\ln{(\mathbf{m}_{x,t})}}}{T} }[/math]
  2. Compute [math]\displaystyle{ \mathbf{A}_{x,t} }[/math] which will be used in SVD:
    [math]\displaystyle{ \mathbf{A}_{x,t} = \ln{(\mathbf{m}_{x,t})} - \mathbf{a}_x }[/math]
  3. Compute the singular value decomposition of [math]\displaystyle{ \mathbf{A}_{x,t} }[/math]:
    [math]\displaystyle{ \mathbf{U} \mathbf{S} \mathbf{V^*} = \text{svd}(\mathbf{A}_{x,t}) }[/math]
  4. Derive [math]\displaystyle{ \mathbf{k}_t }[/math], [math]\displaystyle{ s_1 }[/math] (the scaling eigenvalue), and [math]\displaystyle{ \mathbf{b}_x }[/math] from [math]\displaystyle{ \mathbf{U} }[/math], [math]\displaystyle{ \mathbf{S} }[/math], and [math]\displaystyle{ \mathbf{V^*} }[/math]:
    [math]\displaystyle{ \mathbf{k}_t = (u_{1,1}, u_{2,1}, ..., u_{t,1}) }[/math]
    [math]\displaystyle{ \mathbf{b}_x = (v_{1,1}, v_{1,2}, ..., v_{1,x}) }[/math]
  5. Forecast [math]\displaystyle{ \mathbf{k}_t }[/math] using a standard univariate ARIMA model to [math]\displaystyle{ n }[/math] additional years:
    [math]\displaystyle{ \mathbf{k}_{t+n} = \text{ARIMA}(\mathbf{k}_t, n) }[/math]
  6. Use the forecasted [math]\displaystyle{ \mathbf{k}_{t+n} }[/math], with the original [math]\displaystyle{ \mathbf{b}_x }[/math], and [math]\displaystyle{ \mathbf{a}_x }[/math] to calculate the forecasted mortality rate for each age:
    [math]\displaystyle{ \mathbf{m}_{x,t+n} = \exp(\mathbf{a}_x + s_1 \mathbf{k}_{t+n} \mathbf{b}_x) }[/math]

Discussion

Without applying SVD or some other method of dimension reduction the table of mortality data is a highly correlated multivariate data series, and the complexity of these multidimensional time series makes them difficult to forecast. SVD has become widely used as a method of dimension reduction in many different fields, including by Google in their page rank algorithm.

The Lee–Carter model was introduced by Ronald D. Lee and Lawrence Carter in 1992 with the article "Modeling and Forecasting the Time Series of U.S. Mortality," (Journal of the American Statistical Association 87 (September): 659–671).[2] The model grew out of their work in the late 1980s and early 1990s attempting to use inverse projection to infer rates in historical demography.[3] The model has been used by the United States Social Security Administration, the US Census Bureau, and the United Nations. It has become the most widely used mortality forecasting technique in the world today.[4]

There have been extensions to the Lee–Carter model, most notably to account for missing years, correlated male and female populations, and large scale coherency in populations that share a mortality regime (western Europe, for example). Many related papers can be found on Professor Ronald Lee's website.

Implementations

There are surprisingly few software packages for forecasting with the Lee–Carter model.

  • LCFIT is a web-based package with interactive forms.
  • Professor Rob J. Hyndman provides an R package for demography that includes routines for creating and forecasting a Lee–Carter model.
  • Alternatives in R include the StMoMo package of Villegas, Millossovich and Kaishev (2015).
  • Professor German Rodriguez provides code for the Lee–Carter Model using Stata.
  • Using Matlab, Professor Eric Jondeau and Professor Michael Rockinger have put together the Longevity Toolbox for parameter estimation.

References