Distribution-free maximum likelihood for binary responses

From HandWiki

In this article, let’s take the latent utility model[1] as an example for the binary response model. The intuition of the latent utility model is that respondents will pick up the choice which will give the highest utility for her. Because the utility is not observable, it is assumed that the latent utility is linear with the some explanatory variables which affects the utility of the choice to the respondent and there is an additive response error capturing the randomness of the choice-making process. In this model, the choice is: [math]\displaystyle{ Y_{t}=1[\Chi_{1}t\beta+\varepsilon_{1}\gt \Chi_{2}t\beta+\varepsilon_{2}] }[/math], where [math]\displaystyle{ \Chi_{1}t,\Chi_{2}t }[/math] are two vectors of the explanatory covariates, [math]\displaystyle{ \varepsilon_{1} and \varepsilon_{2} }[/math] are i.i.d response errors,

[math]\displaystyle{ \Chi_{1},t,\beta+\varepsilon_1 \text{ and } \Chi_2,t\beta+\varepsilon_2 }[/math]

are latent utility of choosing choice 1 and 2. Then the log likelihood function can be given as:

[math]\displaystyle{ Q=\sum_{i-1}^N Y_t \log(P[\Chi_1,t\beta-\Chi_2 t\beta\gt \varepsilon_2-\varepsilon_1])+(1-Y_t) \log(1-P[X_1 t\beta-X_2 t\beta\gt \varepsilon_2-\varepsilon_1]) }[/math]

If some distributional assumption about the response error is imposed, then the log likelihood function will have specific close form representation.[2] For instance, if the response error is assumed to be distributed as: [math]\displaystyle{ N(0,\sigma^2) }[/math], then the likelihood function can be rewritten as:

[math]\displaystyle{ Q=\sum_{i-1}^N Y_t \log\left(\Phi\left[\frac{X_{1,t}\beta-X_{2,t} \beta}{\surd2\sigma} \right]\right) + (1-Y_t) \log \left(\Phi \left[ \frac{X_{2,t}\beta-X_{1,t}\beta}{\surd2\sigma} \right] \right) }[/math]

where [math]\displaystyle{ \Phi }[/math] is the cumulative distribution function (CDF) for standard normal distribution. Here, even if [math]\displaystyle{ \Phi }[/math] doesn't have a closed form of representation, its derivative does. Therefore, maximum likelihood estimation can be explicitly computed by solving the first order condition. Alternatively, if the response error is assumed to be distributed as Gumbel [math]\displaystyle{ (0,\sigma^2) }[/math], then the log-likelihood function can be rewritten as:

[math]\displaystyle{ Q=\sum_{i-1}^N Y_t\cdot\log\left(F \left[\frac{X_{1,t}\beta-X_{2,t}\beta}{\sigma}\right]\right) + (1-Y_t) \log\left(F \left[ \frac{X_{2,t}\beta-X_{1,t}\beta}{\sigma} \right] \right) }[/math]

where F is the CDF for the standard logistic distribution, which has a closed form representation.

Both of the models above are based on the distribution assumption about the response error term. Adding specific distribution assumption into the model can make the model computationally tractable due to the existence of the closed form representation. But if the distribution of the error term is mis-sepcified, the estimates based on the distribution assumption will be inconsistent. To get more robust estimator, models which don’t depend on the distribution assumption can be used. The basic idea of the distribution-free model is to replace the two probability term in the log-likelihood function with other weights. The general form of the log-likelihood function can written as:

[math]\displaystyle{ Q= \sum_{i-1}^N Y_t \cdot\log(W_1(X_1 t\beta,\Chi_2,t\beta))+(1-Y_t)\log(W_0 (X_1, t, \beta, X_2, t \beta)) }[/math]

For instance, Manski (1975) proposed a discrete weighting scheme for multi-response model,[3] in the binary context which can be represented as:

where

[math]\displaystyle{ W_1(X_1,t\beta,\Chi_2\beta)=w_1[X,t\beta-X_2 t\beta\gt 0]+w_0 1[X_1 t\beta-X_2 t\beta\lt 0], }[/math]
[math]\displaystyle{ W_0(X_1 t\beta,X_2 t\beta)=1-W_1(X_1 t\beta,X_2,t\beta) }[/math]

and [math]\displaystyle{ w_1 }[/math] and [math]\displaystyle{ w_0 }[/math] are two constants in (0,1). The intuition of this weighting scheme is that the probability of the choice depends on the relative order of the certainty part of the utility. Under the discrete weighting scheme, the estimator, which is also called Maximum Score Estimator, does not have very nice asymptotic property,[4] and Horowitz (1992)[5] proposed a smoothed weighting scheme, which can be represented as:

[math]\displaystyle{ W_1(X_1,t\beta,X_2,t\beta)=K[X_1 t\beta-X_2,t\beta],W_0(X_1,t\beta,X_2,t\beta)=1-K[X_1,t\beta-X_2 t\beta] }[/math]

where the weight function K has to satisfy the following conditions:

(1) |K| is bounded over R;

(2) [math]\displaystyle{ \lim_{u\to\infty} K(u) =0 \text{ and } \lim_{u\to\infty} K (u) =1; }[/math]

(3) [math]\displaystyle{ \dot K(u) = \dot K (-u) }[/math]

Here, the weight function is analogous to a cumulative distribution function but can be more general and flexible than the weight functions in the models based on specific distribution assumption. The estimator under this weighting scheme is also called Smoothed Maximum Score Estimator. Usually, it is more computationally tractable than the Maximum Score Estimator for its smoothness and it is also more robust than the estimator based on the distribution assumptions.

References

  1. Walker, Joan; Ben-Akiva, Moshe (2002). "Generalized random utility model". Mathematical Social Sciences 43 (3): 303–343. doi:10.1016/S0165-4896(02)00023-9. 
  2. Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass: MIT Press. pp. 457–460. ISBN 0-262-23219-7. 
  3. Manski, Charles F. (1975). "Maximum Score Estimation of the Stochastic Utility Model of Choice". Journal of Econometrics 3 (3): 205–228. doi:10.1016/0304-4076(75)90032-9. 
  4. Kim, Jeankyung; Pollard, David (1990). "Cube Root Asymptotics". Annals of Statistics 18 (1): 191–219. 
  5. Horowitz, Joel L. (1992). "A Smoothed Maximum Score Estimator for the Binary Response Model". Econometrica 60 (3): 505–531. doi:10.2307/2951582.