Two-step M-estimators involving MLE

From HandWiki

}}

Two-step M-estimator involving Maximum Likelihood Estimator is a special case of general two-step M-estimator. Thus, consistency and asymptotic normality of the estimator follows from the general result on two-step M-estimators.

Description

Yet, when the first step estimation is MLE, under some assumptions, two-step M-estimator is more efficient [i.e. has smaller asymptotic variance] than M-estimator with known first-step parameter [1]

Let {Vi,Wi,Zi}ni=1 be a random sample and the second-step M-estimator [math]\displaystyle{ \widehat{\theta} }[/math] is the following:

[math]\displaystyle{ \widehat{\theta} }[/math][math]\displaystyle{ \underset{\theta\in\Theta}{\operatorname{arg\max}}\sum_{i}m(\,v_i,w_i,z_i: \theta\,,\widehat{\gamma }) }[/math]

where [math]\displaystyle{ \widehat{\gamma } }[/math] is the parameter estimated by ML procedure in the first step. For the MLE,

[math]\displaystyle{ \widehat{\gamma } }[/math][math]\displaystyle{ \underset{\gamma\in\Gamma}{\operatorname{arg\max}}\sum_{i}\log f(v_{it} : z_{i} , \gamma) }[/math]

where f is the conditional density of V given Z. Now, suppose that given Z, V is conditionally independent of W. This assumption is called conditional independence assumption or selection on observables.[1][2] Intuitively, this condition means that Z is a good predictor of V so that once conditioned on Z, V has no systematic dependence on W. Under the conditional independence assumption, the asymptotic variance of the two-step estimator is:

E[∇θ s(θ00)]−1 E[g(θ00 )g(θ00 )']E[∇θ s(θ00)]−1

where g(θ,γ) ≔ s(θ,γ)-E[ s(θ , γ) ∇γ d(γ)' ]E[∇γ d(γ) ∇γ d(γ)' ]−1 d(γ),

s(θ,γ) ≔ ∇θ m(V, W, Z: θ, γ) , d(γ) ≔ ∇γ log f (V : Z, γ), and ∇ represents partial derivative with respect to a row vector. In the case where γ0 is known, the asymptotic variance is E[∇θ s(θ00)]−1 E[s(θ00 )s(θ00 )']E[∇θ s(θ00)]−1 and therefore, unless E[ s(θ, γ) ∇γ d(γ)' ]=0, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when γ0 is known a priori, there is efficiency gain by estimating γ by MLE. An application of this result can be found, for example, in treatment effect estimation.[1]

References

  1. 1.0 1.1 1.2 Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
  2. Heckman, J.J., and R. Robb, 1985, Alternative Methods for Evaluating the Impact of Interventions: An Overview, Journal of Econometrics, 30, 239-267.