Binary response model with latent variable

From HandWiki

In statistics, the binary response model with latent variable is constructed as:

[math]\displaystyle{ y=1 [y^*\gt 0] }[/math]

where [math]\displaystyle{ y^*=x\beta +\varepsilon }[/math] and [math]\displaystyle{ \varepsilon \mid x\sim G }[/math] This model can be applied in many economic contexts. For instance, [math]\displaystyle{ y }[/math] is the decision of a manager whether invest to a program, [math]\displaystyle{ y^* }[/math] is the expected net discounted cash flow and [math]\displaystyle{ x }[/math] is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow is positive.[1]

Usually, the error term [math]\displaystyle{ \varepsilon }[/math] is assumed to have a homogeneous normal distribution conditional on the exogenous explanatory variables [math]\displaystyle{ x }[/math] and namely [math]\displaystyle{ G }[/math] is the standard normal distribution. It is called the standard probit model[2] under this assumption.

When the variance of [math]\displaystyle{ \varepsilon }[/math] conditional on [math]\displaystyle{ x }[/math] is not constant but dependent on [math]\displaystyle{ x }[/math], then the heteroscedasticity issue arises. For example, if [math]\displaystyle{ y^*= \beta_0+B_1 x_1+\varepsilon }[/math] and [math]\displaystyle{ \varepsilon\mid x \sim N (0,x^2_1) }[/math] where [math]\displaystyle{ x_1 }[/math] is a continuous positive explanatory variable. When there is a heteroskedasticity issue, the probit estimator for [math]\displaystyle{ \beta }[/math] is usually inconsistent, hence most of the tests about the coefficients are invalid. More importantly, the estimator for [math]\displaystyle{ P (y=1\mid x) }[/math] becomes inconsistent, too. To deal with this problem, the original model needs to be transformed to be homoskadestic. For instance, in the same example, [math]\displaystyle{ 1[\beta_0+\beta_1 x_1+\varepsilon\gt 0] }[/math] can be rewritten as [math]\displaystyle{ 1[\beta_0/x_1+\beta_1+\varepsilon/x_1\gt 0] }[/math], where [math]\displaystyle{ \varepsilon/x_1\mid x\sim N(0,1) }[/math]. Therefore, [math]\displaystyle{ P(y=1\mid x) = \Phi (\beta_1 + \beta_0/x_1) }[/math] and running probit on [math]\displaystyle{ (1, 1/x_1) }[/math] can generate the consistent estimator for conditional probability [math]\displaystyle{ P(y=1\mid x). }[/math]

When the normality assumption doesn’t hold, i.e. G is not normal distribution any more, then functional form misspecification issue arises. If the model is still estimated as a probit model, the coefficient estimators are inconsistent. For instance, if G is logistic distribution in the true model, but we estimate it by probit, the estimates will be generally smaller than the true value . However, the inconsistency of the coefficient estimates is practically irrelevant because the estimates for the partial effects, [math]\displaystyle{ \partial P(y=1\mid x)/\partial x_{i'} }[/math], estimated by probit model is very close to the estimates given by the true logit model.[3] Actually, to avoid the issue of distribution form misspecification, it is better to adopt a very general distribution assumption for the error term, such that many different types of distribution can be included in the model. The cost is heavier computation and lower accuracy for the increase of the number of parameter.[4] In most of the cases in practice where the distribution form is misspecified, the estimators for the coefficients are inconsistent, but those for the conditional probability and the partial effects are still very good.

References

  1. For a detailed application example, refer to: Tetsuo Yai, Seiji Iwakura, Shigeru Morichi, Multinomial probit with structured covariance for route choice behavior, Transportation Research Part B: Methodological, Volume 31, Issue 3, June 1997, Pages 195–207, ISSN 0191-2615
  2. Bliss, C. I. (1934). "The Method of Probits". Science 79 (2037): 38–39.
  3. Greene, W. H. (2003), Econometric Analysis , Prentice Hall, Upper Saddle River, NJ.
  4. For more details, refer to: Cappé, O., Moulines, E. and Ryden, T. (2005): “Inference in Hidden Markov Models”, Springer-Verlag New York, Chapter 2.

Bibliography

  • Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass, pp 479.