De-sparsified lasso

From HandWiki
Short description: Generalized linear model

De-sparsified lasso contributes to construct confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in high-dimensional model.[1]

High-dimensional linear model

[math]\displaystyle{ Y = X\beta^0 + \epsilon }[/math] with [math]\displaystyle{ n \times p }[/math] design matrix [math]\displaystyle{ X =: [X_1,..., X_p] }[/math] ([math]\displaystyle{ n \times p }[/math] vectors [math]\displaystyle{ X_j }[/math]), [math]\displaystyle{ \epsilon \sim N_n(0, \sigma^2_\epsilon I) }[/math] independent of [math]\displaystyle{ X }[/math] and unknown regression [math]\displaystyle{ p \times 1 }[/math] vector [math]\displaystyle{ \beta^0 }[/math].

The usual method to find the parameter is by Lasso: [math]\displaystyle{ \hat{\beta}^n(\lambda) = \underset{\beta \in \mathbb{R} ^ p}{argmin} \ \frac{1}{2 n} \left\| Y - X \beta \right\| ^ 2 _ 2 + \lambda \left\| \beta \right\| _ 1 }[/math]

The de-sparsified lasso is a method modified from the Lasso estimator which fulfills the Karush–Kuhn–Tucker conditions[2] is as follows:

[math]\displaystyle{ \hat{\beta}^n(\lambda,M) = \hat{\beta}^n(\lambda) + \frac{1}{n} M X^T(Y- X \hat{\beta}^n (\lambda)) }[/math]

where [math]\displaystyle{ M \in R ^{p\times p} }[/math] is an arbitrary matrix. The matrix [math]\displaystyle{ M }[/math] is generated using a surrogate inverse covariance matrix.

Generalized linear model

Desparsifying [math]\displaystyle{ l_1 }[/math]-norm penalized estimators and corresponding theory can also be applied to models with convex loss functions such as generalized linear models.

Consider the following [math]\displaystyle{ 1 \times p }[/math]vectors of covariables [math]\displaystyle{ x_i \in \chi\subset R^p }[/math] and univariate responses [math]\displaystyle{ y_i \in Y \subset R }[/math] for [math]\displaystyle{ i = 1,...,n }[/math]

we have a loss function [math]\displaystyle{ \rho_\beta(y,x) = \rho(y, x \beta) (\beta \in R^p) }[/math] which is assumed to be strictly convex function in [math]\displaystyle{ \beta \in R^p }[/math]

The [math]\displaystyle{ l_1 }[/math]-norm regularized estimator is [math]\displaystyle{ \hat{\beta}=\underset{\beta}{argmin}(P_n \rho_\beta + \lambda\left\| \beta \right\|_1) }[/math]

Similarly, the Lasso for node wise regression with matrix input is defined as follows: Denote by [math]\displaystyle{ \hat{\Sigma} }[/math] a matrix which we want to approximately invert using nodewise lasso.

The de-sparsified [math]\displaystyle{ l_1 }[/math]-norm regularized estimator is as follows: [math]\displaystyle{ \hat{\gamma_j}:= \underset{\gamma \in R^{p-1}}{argmin}(\hat{\Sigma}_{j,j} - 2 \hat{\Sigma}_{j,/j} \gamma + \gamma^T \hat{\Sigma}_{/j,/j} \gamma + 2 \lambda_j\left\|\gamma\right\|_1 }[/math]

where [math]\displaystyle{ \hat{\Sigma}_{j,/j} }[/math] denotes the [math]\displaystyle{ j }[/math]th row of [math]\displaystyle{ \hat{\Sigma} }[/math] without the diagonal element [math]\displaystyle{ (j,j) }[/math], and [math]\displaystyle{ \hat{\Sigma}_{/j,/j} }[/math] is the sub matrix without the [math]\displaystyle{ j }[/math]th row and [math]\displaystyle{ j }[/math]th column.

References

  1. Geer, Sara van de; Buhlmann, Peter; Ritov, Ya'acov; Dezeure, Ruben (2014). "On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models". The Annals of Statistics 42 (3): 1162–1202. doi:10.1214/14-AOS1221. 
  2. Tibshirani, Ryan; Gordon, Geoff. "Karush-Kuhn-Tucker conditions". https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf.