Fay-Herriot model

From HandWiki
Short description: Statistical model

The Fay–Herriot model is a statistical model which includes some distinct variation for each of several subgroups of observations. It is an area-level model, meaning some input data are associated with sub-aggregates such as regions, jurisdictions, or industries. The model produces estimates about the subgroups. The model is applied in the context of small area estimation in which there is a lot of data overall, but not much for each subgroup.

The subgroups are determined in advance of estimation and are built into the model structure. The model combines, by averaging, estimates of fixed effects and of the random effects type. The model is typically used to adjust for group-related differences in some dependent variable.

In random effects models like the Fay-Herriot, estimation is built on the assumption that the effects associated with subgroups are drawn independently from a normal (Gaussian) distribution, whose variance is estimated from the data on each subgroup. It is more common to use a fixed-effects model instead for many systematically different groups. A mixed random effects model like the Fay-Herriot is preferred if there are not enough observations per group to reliably estimate the fixed effects, or if for some reason fixed effects would not be consistently estimated.

The Fay-Herriot is a two-stage hierarchical model. The parameters of the distributions within the groups are often assumed to be independent, or it is assumed that they are correlated to those measured for another variable.

Model structure and assumptions

In classical Fay-Herriot (FH), the data used for estimation are aggregate estimates for the subgroups based on surveys.

The model can also be applied to microdata. Consider rows of observations numbered j=1 to J, in groups from i=1 to I, with predictive data [math]\displaystyle{ X_{ij} }[/math] for dependent variable [math]\displaystyle{ Y_{ij} }[/math]. If the model includes random effects only, it can be expressed by:

[math]\displaystyle{ Y_{ij} = \mu + \beta X_{ij} + U_i + \epsilon_{ij} }[/math]

A probability distribution is assumed for the random effects [math]\displaystyle{ U_i }[/math], typically a normal distribution. A different distribution can be assumed, e.g. if the sample distribution is known to have heavy tails.[1]

Often fixed effects are included, making it a mixed model, with auxiliary data and economic or probability assumptions that make it possible to identify these effects separately from one another and from sampling variation [math]\displaystyle{ \epsilon_{ij} }[/math].[2]

Estimation

The parameters of interest including the random effects are estimated together iteratively. Methods can include maximum likelihood estimation, the method of moments, or a Bayesian way.[3][4][5]

Fay-Herriot models can be characterized either as mixed models, or in a hierarchical form,[6] or a multilevel regression with poststratification.[7][8][9][10]

The resulting estimates for each area (subgroup) are weighted averages from the direct estimates and indirect estimates based on estimates of variances.

Tests of consistency

For random effects models to make consistent estimates, it is necessary that the subgroup-specific effects be uncorrelated to the other predictor variables in the model. If the subgroup-specific effects are correlated, then random effects estimation would be biased but fixed effects estimation would not be biased.

That correlation can be tested by running both the fixed effects and the random effects models and then applying the Hausman specification test. The test may not reject the hypothesis of no-correlation even when it is false, a Type II error, so that it cannot be definitively concluded that random effects estimation is unbiased even if the Hausman test fails to reject.

History

Robert Fay and Roger Herriot of the U.S. Census Bureau developed the model to make estimates for populations in each of many geographic regions. The authors referred to the method as a James-Stein procedure and did not use the term "random effects."[11] It is an area-level model.[12] The model has been used for the same purpose, called small-area estimation, by other U.S. government agencies.[6][13]

Rao and Molina's small area estimation text is sometimes characterized of as a definitive source about the FH model.[14]

Applications

The FH model is used extensively in the Small Area Income and Poverty Estimates (SAIPE) program of the U.S. Census Bureau.[15]

References

  1. Julie B. Gershunskaya; Terrance D. Savitsky. Dependent Latent Effects Modeling for Survey Estimation with Application to the Current Employment Statistics Surveys. JSM Proceedings 2016.
  2. Pushpal K. Mukhopadhyay and Allen McDowell. Small Area Estimation for Survey Data Analysis Using SAS® Software Paper 336-2011. SAS Institute Inc.
  3. Roberto Benavent; Domingo Morales. 2016. Multivariate Fay–Herriot models for small area estimation. Computational Statistics & Data Analysis 94, 372-390 https://doi.org/10.1016/j.csda.2015.07.013
  4. Aaron T. Porter; Scott H. Holan; Christopher K. Wikle; Noel Cressie. 2013. Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates arXiv:1303.6668
  5. Isabel Molina; Yolanda Marhuenda. 2015. sae: An R Package for Small Area Estimation. The R Journal 7:1, pages 81-98.
  6. 6.0 6.1 Cruze, Nathan B. 2018. Fitting a Bayesian Fay-Herriot Model. Presentation to WSS.
  7. Aaron T. Porter; Scott H. Holan; Christopher K. Wikle; Noel Cressie. Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates
  8. Julie Gershunskaya; Terrance D. Savitsky. 2018. Robust estimation in the presence of deviations from linearity in small domain models. Joint Statistical Meetings 2018, Survey Research Methods Section. pp 595-614.
  9. Lou Rizzo; J. Michael Brick. 2017. Literature Search on Combining Survey and Administrative Records. Task Order 2, BLS BPA 1625DC-17-A-0001. Page C-5 explains the Fay-Herriot parameter estimator after running the model; it's not a linear regression whose coefficient is used directly.
  10. Brendan Halpin. 2012. Fixed and random effects models Sociology course notes. University of Limerick.
  11. Fay, Robert. E.; Roger A. Herriot. 1979. Estimates of income for small places: An application of James-Stein procedures to Census data. Journal of the American Statistical Association, Vol. 74, No. 366 (Jun., 1979), pp. 269-277. jstor
  12. "Comparison of unit level and area level small area estimators. 3. Area level model". 22 June 2016. https://www150.statcan.gc.ca/n1/pub/12-001-x/2016001/article/14540/03-eng.htm. 
  13. Lee Baker; Taylor Le; Nicholas Rose. 2017. Statistical Agency Use of Macro Editing in Industry-Area Employment Estimation. Joint Statistical meetings, Social Statistics Section.
  14. J. N. K. Rao and Isabel Molina. 2015. Small Area Estimation. Wiley & Sons. ISBN:9781118735787
  15. Small Area Income and Poverty Estimates (SAIPE) Program at U.S. Census Bureau site