The Annals of Statistics
2011,V ol.39,No.1,174–200
DOI:10.1214/10-AOS832
行政管理英文
©Institute of Mathematical Statistics,2011
FOCUSED INFORMATION CRITERION AND MODEL A VERAGING FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS
B Y X INYU Z HANG1AND H UA L IANG2
Chine Academy of Sciences and University of Rochester
We study model lection and model averaging in generalized additive partial linear models(GAPLMs).Polynomial spline is ud to approximate
nonparametric functions.The corresponding estimators of the linear para-
干豆皮的做法meters are shown to be asymptotically normal.We then develop a focud行香子过七里濑
information criterion(FIC)and a frequentist model average(FMA)estimator
三八妇女节主题
on the basis of the quasi-likelihood principle and examine theoretical proper-
ties of the FIC and FMA.The major advantages of the propod procedures
over the existing ones are their computational expediency and theoretical re-
liability.Simulation experiments have provided evidence of the superiority of
the propod procedures.The approach is further applied to a real-world data
example.
1.Introduction.Generalized additive models,which are a generalization of the generalized models and involve a summand of one-dimensional nonparamet-ric functions instead of a summand of linear components,have been widely ud to explore the complicated relationships between a respon to treatment and pre-dictors of interest[Hastie and Tibshirani(1990)].Various attempts are still being made to balance the interpretation of generalized linear models and theflexibility of generalized additive models such as generalized additive partial linear models (GAPLMs),in which some of the a
dditive component functions are linear,while the remaining ones are modeled nonparametrically[Härdle et al.(2004a,2004b)].
A special ca of a GAPLM with a single nonparametric component,the gener-alized partial linear model(GPLM),has been well studied in the literature;e, for example,Severini and Staniswalis(1994),Lin and Carroll(2001),Hunsberger (1994),Hunsberger et al.(2002)and Liang(2008).The profile quasi-likelihood procedure has generally been ud,that is,the estimation of GPLM is made com-putationally feasible by the idea that estimates of the parameters can be found for a known nonparametric function,and an estimate of the nonparametric function can Received February2010;revid May2010.
1Supported in part by the National Natural Science Foundation of China Grants70625004and 70933003.
2Supported in part by NSF Grant DMS-08-06097.
AMS2000subject classifications.Primary62G08;condary62G20,62G99.
Key words and phras.Additive models,backfitting,focus parameter,generalized partially lin-ear models,marginal integration,model average,model lection,polynomial spline,shrinkage methods.
174
GENERALIZED ADDITIVE PARTIALLY LINEAR MODELS175 be found for the estimated parameters.Severini and Staniswalis(1994)showed that the resulting estimators of the parameter are asymptotically normal and that estimators of the nonparametric functions are consistent in supremum norm.The computational algorithm involves arching for maxima of global and local likeli-hoods simultaneously.It is worthwhile to point out that studying GPLM is easier than studying GAPLMs,partly becau there is only one nonparametric term in GPLM.Correspondingly,implementation of the estimation for GPLM is simpler than for GAPLMs.Nevertheless,the GAPLMs are moreflexible and uful than GPLM becau the former allow veral nonparametric terms for some covariates and parametric terms for others,and thus it is possible to explore more complex re-lationships between the respon variables and covariates.For example,Shiboski (1998)ud a GAPLM to study AIDS clinical trial data and Müller and Rönz (2000)ud a GAPLM to carry out credit scoring.However,few theoretical re-sults are available for GAPLMs,due to their generalflexibility.In this article,we shall study estimation of GAPLMs using polynomial spline,establish asymptotic normality for the estimators of the linear parameters and develop a focud in-formation criterion(FIC)for model lection and a frequentist model averaging (
环境工程就业方向
FMA)procedure in construction of the confidence intervals for the focus parame-ters with improved coverage probability.
We know that traditional model lection methods such as the Akaike informa-tion criterion[AIC,Akaike(1973)]and the Bayesian information criterion[BIC, Schwarz(1978)]aim to lect a model with good overall properties,but the -lected model is not necessarily good for estimating a specific parameter under consideration,which may be a function of the model parameters;e an inspiring example in Section4.4of Claeskens and Hjort(2003).Exploring the data t from the Wisconsin epidemiologic study of diabetic retinopathy,Claeskens,Croux and van Kerckhoven(2006)also noted that different models are suitable for different patient groups.This occurrence has been confirmed by Hand and Vinciotti(2003) and Hann(2005).Motivated by this concern,Claeskens and Hjort(2003)pro-pod a new model lection criterion,FIC,which is an unbiad estimate of the limiting risk for the limit distribution of an estimator of the focus parameter,and systematically developed a general asymptotic theory for the propod criterion. More recently,FIC has been studied in veral models.Hjort and Claeskens(2006) developed the FIC for the Cox hazard regression model and applied it to a study of skin cancer;Claeskens,Croux and van Kerckhoven(2007)introduced the FIC for autoregressive models and ud it to predict the net number of new personal life insurance policies for a large insurance company.
The existing model lection methods may arrive at a model which is thought to be able to capture the main information of the data,and to be decided in advance in data analysis.Such an approach may lead to the ignoring of uncertainty intro-duced by model lection.Thus,the reported confidence intervals are too narrow or shift away from the correct location,and the corresponding coverage probabili-ties of the resulting confidence intervals can substantially deviate from the nominal
176X.ZHANG AND H.LIANG
level[Danilov and Magnus(2004)and Shen,Huang and Ye(2004)].Model aver-aging,as an alternative to model lection,not only provides a kind of insurance against lecting a very poor model,but can also avoid model lection instability [Yang(2001)and Leung and Barron(2006)]by weighting/smoothing estimators across veral models,instead of relying entirely on a single model lected by some model lection criterion.As a conquence,analysis of the distribution of model averaging estimators can improve coverage probabilities.This strategy has been adopted and studied in the literature,for example,Draper(1995),Buckland, Burnham and Augustin(1997),Burnham and Anderson(2002),Danilov and Mag-nus(2004)and Leeb and Pöstcher(2006).A minal work,Hjort and Claeskens (2003),developed asymptotic distribution theories for estimation and inference af-ter model lection and model averaging across parametric models.See Claeskens and Hjort(2008)for a
comprehensive survey on FIC and model averaging.
FIC and FMA have been well studied for parametric models.However,few ef-forts have been made to study FIC and FMA for miparametric models.To the best of our knowledge,only Claeskens and Carroll(2007)studied FMA in mi-parametric partial linear models with a univariate nonparametric component.The existing results are hard to extend directly to GAPLMs,for the following reasons: (i)there exist nonparametric components in GAPLMs,so the ordinary likelihood method cannot be directly ud in estimation for GAPLMs;(ii)unlike the mi-parametric partial linear models in Claeskens and Carroll(2007),GAPLMs allow for multivariate covariate consideration in nonparametric components and also al-low for the mean of the respon variable to be connected to the covariates by a link function,which means that the binary/count respon variable can be consid-ered in the model.Thus,to develop FIC and FMA procedures for GAPLMs and to establish asymptotic properties for the procedures are by no means straightfor-ward to achieve.Aiming at the two goals,wefirst need to appropriately estimate the coefficients of the parametric components(hereafter,we call the coefficients “linear parameters”).
There are two commonly ud estimation approaches for GAPLMs:thefirst is local scoring backfitting,propod by Buja,Hastie and Tibshirani(1989);the cond is an application of the margina
l integration approach on the nonparamet-ric component[Linton and Nieln(1995)].However,theoretical properties of the former are not well understood since it is only defined implicitly as the limit of a complicated iterative algorithm,while the latter suffers from the cur of dimen-sionality[Härdle et al.(2004a)],which may lead to an increa in the computa-tional burden and which also conflicts with the purpo of using a GAPLM,that is,dimension reduction.Therefore,in this article,we apply polynomial spline to approximate nonparametric functions in GAPLMs.After the spline basis is cho-n,the nonparametric components are replaced by a linear combination of spline basis,then the coefficients can be estimated by an efficient one-step maximizing procedure.Since the polynomial-spline-bad method solves much smaller sys-tems of equations than kernel-bad methods that solve larger systems(which may
GENERALIZED ADDITIVE PARTIALLY LINEAR MODELS177 lead to identifiability problems),our polynomial-spline-bad procedures can sub-stantially reduce the computational burden.See a similar discussion about this computational issue in Yu,Park and Mammen(2008),in the generalized additive models context.
The u of polynomial spline in generalized nonparametric models can be traced back to Stone(1986),where the rate of convergence of the polynomial spline es-timates for the generalized a
欠条范本dditive model werefirst obtained.Stone(1994)and Huang(1998)investigated the polynomial spline estimation for the generalized functional ANOV A model.In a widely discusd paper,Stone et al.(1997)pre-nted a completely theoretical tting of polynomial spline approximation,with applications to a wide array of statistical problems,ranging from least-squares re-gression,density and conditional density estimation,and generalized regression such as logistic and Poisson regression,to polychotomous regression and hazard regression.Recently,Xue and Yang(2006)studied estimation in the additive coef-ficient model with continuous respon using polynomial spline to approximate the coefficient functions.Sun,Kopciuk and Lu(2008)ud polynomial spline in par-tially linear single-index proportional hazards regression models.Fan,Feng and Song(2009)applied polynomial spline to develop nonparametric independence screening in spar ultra-high-dimensional additive models.Few attempts have been made to study polynomial spline for GAPLMs,due to the extreme technical difficulties involved.
The remainder of this article is organized as follows.Section2ts out the model framework and provides the polynomial spline estimation and asymptotic normality of estimators.Section3introduces the FIC and FMA procedures and constructs confidence intervals for the focus parameters on a basis of FMA esti-mators.A simulation study and real-world data analysis are prented in Sections 4and5,respectively.Regularity conditions and technical proofs are prented in the Appendix.
2.Model framework and estimation.We consider a GAPLM where the respon Y is related to covariates X=(X1,...,X p)T∈R p and Z=(Z1,..., Z d)T∈R d.Let the unknown mean respon u(x,z)=E(Y|X=x,Z=z)and the conditional variance function be defined by a known positive function V, var(Y|X=x,Z=z)=V{u(x,z)}.In this article,the mean function u is defined via a known link function g by an additive linear function
g{u(x,z)}=
p
α=1
ηα(xα)+z Tβ,
(2.1)
where xαis theαth element of x,βis a d-dimensional regression parameter and theηα’s are unknown smooth functions.To ensure identifiability,we assume that E{ηα(Xα)}=0for1≤α≤p.
Letβ=(βT c,βT u)T be a vector with d=d c+d u components,whereβc con-sists of thefirst d c parameters ofβ(which we certainly wish to be in the lected电视机柜尺寸
178X.ZHANG AND H.LIANG
model)and βu consists of the remaining d u parameters (for which we are unsure whether or not they should be included in the lected model).In what follows,we call the elements of z corresponding to βc and βu the certain and exploratory vari-ables,respectively.As in the literature on FIC,we consider a local misspecification framework where the true value of the parameter vector βis β0=(βT c,0,δT /√n)T ,with δbeing a d u ×1vector;that is,the true model is away from the deduced model with a distance O(1/√n).This framework indicates that squared model bias and estimator variances are both of size O(1/n),the most possible large-sample approximations.Some arguments related to this framework appear in Hjort and Claeskens (2003,2006).Denote by βS =(βT c ,βT u,S )T the parameter vector in the S th submodel,in the same n as β,with βu,S being a d u,S -subvector of βu .Let πS be the projec-tion matrix of size d u,S ×d u mapping βu to βu,S .With d u exploratory covariates,our tup allows 2d u extended models to choo among.However,it is not nec-essary to deal with all 2d u possible models and one is free to consider only a few relevant submodels (unnecessarily nested or ordered)to be ud in the model -lection or averaging.A special example is the James–Stein-type estimator studied by Kim and White (2001),which is a weighted summand of the estimators bad on the reduced model (d u,S =0)and the full model (d u,S =d u ).So,the cova
riates in the S th submodel are X and S Z ,where S =diag (I d c ,πS ).To save space,we generally ignore the dimensions of zero vectors/matrices and identity matrices,simply denoting them by 0and I,respectively.If necessary,we will write their dimensions explicitly.In the remainder of this ction,we shall investigate poly-nomial spline estimation for (βT c,0,0)bad on the S th submodel and establish a theoretical property for the resulting estimators.Let η0= p α=1η0,α(x α)be the true additive function and the covariate X αbe distributed on a compact interval [a α,b α].Without loss of generality,we take all intervals [a α,b α]=[0,1]for α=1,...,p .Noting (A.7)in Appendix A.2,under some smoothness assumptions in Appendix A.1,η0can be well approximated by spline functions.Let S n be the space of polynomial splines on [0,1]of degree ≥1.We introduce a knot quence with J interior knots,k − =···=k −1=k 0=0<k 1<···<k J <1=k J +1=···=k J + +1,where J ≡J n increas when sample size n increas and the preci order is given in condition (C6).Then,S n consists of functions ςsatisfying the following:
(i)ςis a polynomial of degree on each of the subintervals [k j ,k j +1),j =
0,...,J n −1,and the last subinterval is [k J n ,1];
(ii)for ≥2,ςis ( −1)-times continuously differentiable on [0,1].
For simplicity of proof,equally spaced knots are ud.Let h =1/(J n +1)be the distance between two concutive knots.抉择是什么意思
Let (Y i ,X i ,Z i ),i =1,...,n ,be independent copies of (Y,X ,Z ).In the S th submodel,we consider the additive spline estimates of η0bad on the independent random sample (Y i ,X i , S Z i ),i =1,...,n .Let G n be the collection of functions