# Two-step M-estimators

Jump to: navigation, search

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

## Description

The class of two-step M-estimators includes Heckman's sample selection estimator, weighted non-linear least squares, and ordinary least squares with generated regressors.

To fix idea, let $\displaystyle{ \{W_{i}\}^n_{i=1} }$ $\displaystyle{ \subseteq R^d }$ be an i.i.d. sample. $\displaystyle{ \Theta }$ and $\displaystyle{ \Gamma }$ are subsets of Euclidean spaces $\displaystyle{ R^p }$ and $\displaystyle{ R^q }$, respectively. Given a function $\displaystyle{ m(;;;): R^d \times \Theta \times \Gamma\rightarrow R }$ , two-step M-estimator $\displaystyle{ \hat\Theta }$ is defined as:

$\displaystyle{ \hat \Theta:=\arg\max_{\Theta\epsilon\Theta}\frac{1}{n}\sum_{i}m\bigl(W_{i},\theta,\hat\gamma\bigr) }$

where $\displaystyle{ \hat\gamma }$ is a parameter that needs to be estimated in the first step.

Consistency of two-step M-estimators can be verified by checking consistency conditions for usual M-estimators, although some modification might be necessary. In practice, the important condition to check is identification condition. If $\displaystyle{ \hat\gamma\rightarrow\gamma^*, }$ where $\displaystyle{ y^* }$where $\displaystyle{ \gamma^* }$ is a non-random vector, then the identification condition is that $\displaystyle{ E[m(W_{1},\theta,\gamma^*)] }$ has a unique maximizer over $\displaystyle{ \Theta }$.

Under regularity conditions, two-step M-estimators have asymptotic normality. An important point to note is that asymptotic variance of a two-step M-estimator is generally not the same as that of the usual M-estimator in which the first step estimation is not necessary. This fact is intuitive because $\displaystyle{ \gamma^* }$ is a random object and its variability should influence the estimation of $\displaystyle{ \Theta }$. However, there exists a special case in which the asymptotic variance of two-step M-estimator takes the form as if there were no first-step estimation procedure. Such special case occurs if:

$\displaystyle{ E \frac{\partial}{\partial\theta\partial\gamma}m(W_{1},\theta_{0},\gamma^*)=0 }$

where $\displaystyle{ \theta_{0} }$ is the true value of $\displaystyle{ \theta }$ and $\displaystyle{ \gamma^* }$ is the probability limit of $\displaystyle{ \hat\gamma }$. To interpret this condition, first note that under regularity conditions, $\displaystyle{ E \frac{\partial}{\partial\theta\partial\gamma}m(W_{1},\theta_{0},\gamma^*)=0 }$ since $\displaystyle{ \theta_{0} }$ is the maximizer of $\displaystyle{ E m(W_{1},\theta \gamma^*) }$. So the condition above implies that small perturbation in γ has no impact on the First-Order condition. Thus, in large sample, variability of $\displaystyle{ \hat\gamma }$ does not affect the argmax of the objective function, which explains invariant property of asymptotic variance. Of course, this result is valid only as the sample size tends to infinity, so the finite-sample property could be quite different.

## Involving MLE

Two-step M-estimator involving Maximum Likelihood Estimator is a special case of general two-step M-estimator. Thus, consistency and asymptotic normality of the estimator follows from the general result on two-step M-estimators.

When the first step estimation is MLE, under some assumptions, two-step M-estimator is more efficient [i.e. has smaller asymptotic variance] than M-estimator with known first-step parameter.

Let {Vi,Wi,Zi}ni=1 be a random sample and the second-step M-estimator $\displaystyle{ \widehat{\theta} }$ is the following:

$\displaystyle{ \widehat{\theta} }$$\displaystyle{ \underset{\theta\in\Theta}{\operatorname{arg\max}}\sum_{i}m(\,v_i,w_i,z_i: \theta\,,\widehat{\gamma }) }$

where $\displaystyle{ \widehat{\gamma } }$ is the parameter estimated by ML procedure in the first step. For the MLE,

$\displaystyle{ \widehat{\gamma } }$$\displaystyle{ \underset{\gamma\in\Gamma}{\operatorname{arg\max}}\sum_{i}\log f(v_{it} : z_{i} , \gamma) }$

where f is the conditional density of V given Z. Now, suppose that given Z, V is conditionally independent of W. This assumption is called conditional independence assumption or selection on observables. Intuitively, this condition means that Z is a good predictor of V so that once conditioned on Z, V has no systematic dependence on W. Under the conditional independence assumption, the asymptotic variance of the two-step estimator is:

E[∇θ s(θ00)]−1 E[g(θ00 )g(θ00 )']E[∇θ s(θ00)]−1

where g(θ,γ) ≔ s(θ,γ)-E[ s(θ , γ) ∇γ d(γ)' ]E[∇γ d(γ) ∇γ d(γ)' ]−1 d(γ),

s(θ,γ) ≔ ∇θ m(V, W, Z: θ, γ) , d(γ) ≔ ∇γ log f (V : Z, γ), and ∇ represents partial derivative with respect to a row vector. In the case where γ0 is known, the asymptotic variance is E[∇θ s(θ00)]−1 E[s(θ00 )s(θ00 )']E[∇θ s(θ00)]−1 and therefore, unless E[ s(θ, γ) ∇γ d(γ)' ]=0, the two-step M-estimator is more efficient than the usual M-estimator. This fact suggests that even when γ0 is known a priori, there is efficiency gain by estimating γ by MLE. An application of this result can be found, for example, in treatment effect estimation.