Two-step M-estimators involving MLE
}}
Two-step M-estimator involving Maximum Likelihood Estimator is a special case of general two-step M-estimator. Thus, consistency and asymptotic normality of the estimator follows from the general result on two-step M-estimators.
Description
Yet, when the first step estimation is MLE, under some assumptions, two-step M-estimator is more efficient [i.e. has smaller asymptotic variance] than M-estimator with known first-step parameter [1]
Let {Vi,Wi,Zi}ni=1 be a random sample and the second-step M-estimator [math]\displaystyle{ \widehat{\theta} }[/math] is the following:
[math]\displaystyle{ \widehat{\theta} }[/math] ≔ [math]\displaystyle{ \underset{\theta\in\Theta}{\operatorname{arg\max}}\sum_{i}m(\,v_i,w_i,z_i: \theta\,,\widehat{\gamma }) }[/math]
where [math]\displaystyle{ \widehat{\gamma } }[/math] is the parameter estimated by ML procedure in the first step. For the MLE,
[math]\displaystyle{ \widehat{\gamma } }[/math] ≔ [math]\displaystyle{ \underset{\gamma\in\Gamma}{\operatorname{arg\max}}\sum_{i}\log f(v_{it} : z_{i} , \gamma) }[/math]
where f is the conditional density of V given Z. Now, suppose that given Z, V is conditionally independent of W. This assumption is called conditional independence assumption or selection on observables.[1][2] Intuitively, this condition means that Z is a good predictor of V so that once conditioned on Z, V has no systematic dependence on W. Under the conditional independence assumption, the asymptotic variance of the two-step estimator is:
E[∇θ s(θ0,γ0)]−1 E[g(θ0,γ0 )g(θ0,γ0 )']E[∇θ s(θ0,γ0)]−1
where g(θ,γ) ≔ s(θ,γ)-E[ s(θ , γ) ∇γ d(γ)' ]E[∇γ d(γ) ∇γ d(γ)' ]−1 d(γ),
s(θ,γ) ≔ ∇θ m(V, W, Z: θ, γ) , d(γ) ≔ ∇γ log f (V : Z, γ), and ∇ represents partial derivative with respect to a row vector. In the case where γ0 is known, the asymptotic variance is E[∇θ s(θ0,γ0)]−1 E[s(θ0,γ0 )s(θ0,γ0 )']E[∇θ s(θ0,γ0)]−1 and therefore, unless E[ s(θ, γ) ∇γ d(γ)' ]=0, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when γ0 is known a priori, there is efficiency gain by estimating γ by MLE. An application of this result can be found, for example, in treatment effect estimation.[1]
References
This article needs additional or more specific categories. (April 2018) |