a r
关于大海的成语X
i
v拔倒刺
:12
5
.
4
7
9
5
v
2
[
m
a
t h
.
S
T
]
27
M a
r
2
1
3
Submitted to the Annals of Statistics ADAPTIVE ROBUST V ARIABLE SELECTION By Jianqing Fan ∗,Yingying Fan †and Emre Barut Princeton University,University of Southern California and Princeton University Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and po great challenges to modern statistical analysis.A natural procedure to address this problem is to u pe-nalized quantile regression with weighted L 1-penalty,called weighted robust Lasso (WR-Lasso),in which weights are introduced to ame-liorate the bias problem induced by the L 1-penalty.In the ultra-high dimensional tting,where the dimensionality can grow exponentially with the sample size,we investigate the model lection oracle prop-erty and establish the asymptotic normality of the WR-Lasso.We show that only mild conditions on the model error distr
ibution are needed.Our theoretical results also reveal that adaptive choice of the weight vector is esntial for the WR-Lasso to enjoy the nice asymp-totic properties.To make the WR-Lasso practically feasible,we pro-po a two-step procedure,called adaptive robust Lasso (AR-Lasso),in which the weight vector in the cond step is constructed bad on the L 1-penalized quantile regression estimate from the first step.This two-step procedure is justified theoretically to posss the oracle property and the asymptotic normality.Numerical studies demon-strate the favorable finite-sample performance of the AR-Lasso.1.Introduction.The advent of modern technology makes it easier to collect massive,large-scale data ts.A common feature of the data ts is that the number of covariates greatly exceeds the number of obrvations,a regime opposite to conventional statistical ttings.For example,portfolio allocation with hundreds of stocks in finance involves a covariance matrix of about tens of thousands of parameters,but the sample sizes are often only in the order of hundreds (e.g.,daily data over a year period (Fan et al.,2008)).Genome-wide association studies in biology involve hundreds of thousands
of single-nucleotide polymorphisms (SNPs),but the available sample size
2F AN ET AL.
is usually in hundreds too.Data-ts with large number of variables but relatively small sample size po great,unprecedented challenges and op-portunities,for statistical analysis.
Regularization methods have been widely ud for high-dimensional vari-able lection(Bickel and Li,2006;Bickel et al.,2009;Efron et al.,2007; Fan and Li,2001;Lv and Fan,2009;Tibshirani,1996;Zhang,2010;Zou, 2006).Yet,most existing methods such as penalized least-squares or penal-ized likelihood(Fan and Lv,2011)are designed for light-tailed distributions. Zhao and Yu(2006)established the irreprentable conditions for the model lection consistency of the Lasso estimator.Fan and Li(2001)studied the oracle properties of nonconcave penalized likelihood estimators forfixed di-mensionality.Lv and Fan(2009)investigated the penalized least-squares es-timator with folded-concave penalty functions in the ultra-high dimensional tting and established a nonasymptotic weak oracle property.Fan and Lv (2008)propod and investigated the sure independence screening method in the tting of light-tailed distributions.The robustness of the aforemen-tioned methods have not yet been thoroughly studied and well understood. Robust regularization methods such as the least absolute deviation(LAD) regression and quantile regression have been ud for variable lection in the ca offixed dimensionality.See,for example,Li and Zhu(2008); Wang,Li and Jiang(2007);Wu and Liu(2009);Zou and Yuan(2008).The penalized composi
te likelihood method was propod in Bradic et al.(2011) for robust estimation in ultra-high dimensions with focus on the efficiency of the method.They still assumed sub-Gaussian tails.Belloni and Chernozhukov (2011)studied the L1-penalized quantile regression in high-dimensional spar models where the dimensionality could be larger than the sample size.We refer to their method as robust Lasso(R-Lasso).They showed that the R-Lasso estimate is consistent at the near-oracle rate,and gave conditions under which the lected model includes the true model,and derived bounds on the size of the lected model,uniformly in a compact t of quantile in-dices.Wang(2012)studied the L1-penalized LAD regression and showed that the estimate achieves near oracle risk performance with a nearly uni-versal penalty parameter and established also a sure screening property fo such an estimator.van de Geer and M¨u ller(2012)obtained bounds on the prediction error of a large class of L1penalized estimators,including quantile regression.Wang et al.(2012)considered the nonconvex penalized quantile regression in the ultra-high dimensional tting and showed that the ora-cle estimate belongs to the t of local minima of the nonconvex penalized quantile regression,under mild assumptions on the error distribution.
丹霞山
In this paper,we introduce the penalized quantile regression with the
ADAPTIVE ROBUST VARIABLE SELECTION3 weighted L1-penalty(WR-Lasso)for robust regulariza
tion,as in Bradic et al. (2011).The weights are introduced to reduce the bias problem induced by the L1-penalty.Theflexibility of the choice of the weights providesflexibility in shrinkage estimation of the regression coefficient.WR-Lasso shares a sim-ilar spirit to the folded-concave penalized quantile-regression(Wang et al., 2012;Zou and Li,2008),but avoids the nonconvex optimization problem. We establish conditions on the error distribution in order for the WR-Lasso to successfully recover the true underlying spar model with asymptotic probability one.It turns out that the required condition is much weaker than the sub-Gaussian assumption in Bradic et al.(2011).The only condi-tions we impo is that the density function of error has Lipschitz property in a neighborhood around0.This includes a large class of heavy-tailed distri-butions such as the stable distributions,including the Cauchy distribution. It also covers the double exponential distribution who density function is nondifferentiable at the origin.
Unfortunately,becau of the penalized nature of the estimator,WR-Lasso estimate has a bias.In order to reduce the bias,the weights in WR-Lasso need to be chon adaptively according to the magnitudes of the un-known true regression coefficients,which makes the bias reduction infeasible for practical applications.
To make the bias reduction feasible,we introduce the adaptive robust Lasso(AR-Lasso).The AR-Lass
ofirst runs R-Lasso to obtain an initial es-timate,and then computes the weight vector of the weighted L1-penalty according to a decreasing function of the magnitude of the initial estimate. After that,AR-Lasso runs WR-Lasso with the computed weights.We for-mally establish the model lection oracle property of AR-Lasso in the con-text of Fan and Li(2001)with no assumptions made on the tail distribution of the model error.In particular,the asymptotic normality of the AR-Lasso is formally established.
带壳冻大虾怎么做好吃
This paper is organized as follows.First,we introduce our robust estima-tors in Section2.Then,to demonstrate the advantages of our estimator,we show in Section3with a simple example that Lasso behaves sub-optimally when noi has heavy tails.In Section4.1,we study the performance of the oracle-assisted regularization estimator.Then in Section4.2,we show that when the weights are adaptively chon,WR-Lasso has the model lection oracle property,and performs as well as the oracle-assisted regularization estimate.In Section4.3,we prove the asymptotic normality of our propod estimator.The feasible estimator,AR-Lasso,is investigated in Section5. Finally Section6prents the results of the simulation studies as well as a genome-wide association study with SNPs.The proofs are relegated to the
4F AN ET AL.
Appendix.
2.Adaptive Robust Lasso.Consider the linear regression model (2.1)
y=Xβ+ε,
where y is an n-dimensional respon vector,X=(x1,...,x n)T=(˜x1,···,˜x p) is an n×pfixed design matrix,β=(β1,...,βp)T is a p-dimensional regres-sion coefficient vector,andε=(ε1,...,εn)T is an n-dimensional error vector
who components are independently distributed and satisfy P(εi≤0)=τfor some known constantτ∈(0,1).Under this model,x T iβis the condi-tionalτth-quantile of y i given x i.We impo no conditions on the heaviness of the tail probability or the homoscedasticity ofεi.We consider a challeng-ing tting in which log p=o(n b)with some constant b>0.To ensure the model identifiability and to enhance the modelfitting accuracy and inter-pretability,the true regression coefficient vectorβ∗is commonly impod to be spar with only a small proportion of nonzeros(Fan and Li,2001; Tibshirani,1996).Denoting the number of nonzero elements of the true re-gression coefficients by s n,we allow s n to slowly diverge with the sample size n and assume that s n=o(n).To ea the prentation,we suppress the dependence of s n on n whenever there is no confusion.With
out loss of gener-ality,we writeβ∗=(β∗T1,0T)ly thefirst s entries are non-vanishing. The true model is denoted by
M∗=supp(β∗)={1,···,s},
and its complement,M c∗={s+1,···,p},reprents the t of noi vari-ables.
白萝卜的作用
We consider afixed design matrix in this paper and denote by S= (S1,···,S n)T=(˜x1,···,˜x s)the submatrix of X corresponding to the co-variates who coefficients are non-vanishing.The variables will be re-ferred to as the signal covariates and the rest will be called noi covariates. The t of columns that correspond to the noi covariates are denoted by
凛冽的读音Q=(Q1,···,Q n)T=(˜x s+1,···,˜x p).We standardize each column of X to √
have L2-norm
ADAPTIVE ROBUST VARIABLE SELECTION5 whereρτ(u)=u(τ−1{u≤0})is the quantile loss function,and pλ
n
(·)is a nonnegative penalty function on[0,∞)with a regularization parameter λn≥0.The u of quantile loss function in(2.2)is to overcome the diffi-culty of heavy tails of the error distribution.Since P(ε≤0)=τ,(2.2)can be interpreted as the spar estimation of the conditionalτth quantile.Re-garding the choice of pλ
n
(·),it was demonstrated in Lv and Fan(2009)and Fan and Lv(2011)that folded-concave penalties are more advantageous for variable lection in high dimensions than the convex ones such as the L1-penalty.It is,however,computationally more challenging to minimize the objective function in(2.2)when pλ(·)is folded-concave.Noting that with a good initial estimate βini=( βini1,···, βini p)T of the true coefficient vector, we have
pλ
n
(|βj|)≈pλn(| βini j|)+p′λn(| βini j|) |βj|−| βini j| .
Thus,instead of(2.2)we consider the following weighted L1-regularized quantile regression
舒张压偏低
L n(β)=
n
i=1
ρτ(y i−x T iβ)+nλn d◦β 1,
(2.3)
where d=(d1,···,d p)T is the vector of non-negative weights,and◦is the Hadamard ,the componentwi product of two vectors.This motivates us to define the weighted robust Lasso(WR-Lasso)estimate as the global minimizer of the convex function L n(β)for a given non-stochastic weight vector:
β=arg minβL n(β).
(2.4)
The uniqueness of the global minimizer is easily guaranteed by adding a negligible L2-regularization
in implementation.In particular,when d j=1 for all j,the method will be referred to as robust Lasso(R-Lasso).
The adaptive robust Lasso(AR-Lasso)refers specifically to the two-stage
procedure in which the stochastic weightsˆd j=p′λ
n (| βini j|)for j=1,···,p
羊头are ud in the cond step for WR-Lasso and are constructed using a con-cave penalty pλ
n
(·)and the initial estimates, βini j,from thefirst step.In practice,we recommend using R-Lasso as the initial estimate and then us-ing SCAD to compute the weights in AR-Lasso.The asymptotic result of this specific AR-Lasso is summarized in Corollary1in Section5for the ultra-high dimensional robust regression problem.This is a main contribution of the paper.