Stat231Handout on Regression Diagnostics
There are various kinds of residuals,ways of plotting them and measures of"in‡uence"on a regression that are meant to help in the black art of model building.We have already alluded to the fact that under the MLR model,we expect ordinary residuals
e i=y i b y i
to look like mean0normal random noi and that standardized or studentized residuals
e i=
e i standard error o
f e i
should like standard normal random noi.
Deleted Residuals and the PRESS Statistic
There is also the notion of deleted residuals.The are built on the idea that a model should not be terri
bly nsitive to individual data points ud to…t it,or equivalently that one ought to be able to predict a respon even without using that respon to…t the model.Beginning with a particular form for a MLR model and n data points,let
b y(i)=the value of y i predicted by a model…t to the other(n 1)data points
(note that this is not necessarily b y i).The i th deleted residual is
e(i)=y i b y(i)
and the hope that if a model is a good one and not overly nsitive to the exact data vectors ud to…t it,the shouldn’t be ridiculously larger in magnitude than the regular residuals,e i.The "prediction sum of squares"is a single number summary of the
P RESS=
n
X i=1(y i b y(i))2
and one wants small values of this.(Note that P RESS SSE,but one hopes that it is not too much larger.)
This does not exhaust the ways in which people have suggested using the residual idea.It is possible to invent standardized/Studentized deleted residuals
e (i)=
e(i) standard error of e(i)
and there are yet other possibilities.
1
Partial Residual Plots(JMP"E¤ect Leverage Plots")
In somewhat nonstandard language,SAS/JMP makes what it calls"e¤ect leverage plots"that ac-company its"e¤ect tests."The are bad on another kind of residuals,sometimes called partial residuals.With k predictor variables,I might think about understanding the importance of variable j by considering residuals computed using only the other k 1predictor variables to do prediction (i.e.using
a reduced model not including x j).Although it is nearly impossible to e this from their manual and help functions or how the axes of the plots are labeled,the e¤ect leverage plot in JMP for variable j is esntially a plot of
e(j)(y i)=the i th y residual regressing on all predictor variables except x j
versus
e(j)(x ji)=the i th x j residual regressing on all predictor variables except x j
To be more preci,exactly what is plotted is
e(j)(y i)+y versus e(j)(x ji)+x j
On this plots there is a horizontal line drawn at y(at y partial residual equal perfectly predicted by all predictors excepting x j).The vertical axis IS in the original y units,but should not really be labeled as y,but rather as partial residual.The sum of squared vertical distances from the plotted points to this line is then SSE for a model without predictor j.
The horizontal plotting positions of the points are in the original x j units,but are esntially partial res
iduals of the x j’s NOT x j’s themlves.The horizontal center of the plot is at x j(at x j partial residual equal at x j perfectly predicted from all predictors except x j).The non-horizontal line on the plots is in fact the least squares line through the plotted points.What is interesting is that the usual residuals from that least squares line are the residuals for the full MLR…t to the data.So the sum of the squared vertical distances from points to sloped line is then SSE for the full model.The larger is reduction in SSE from the horizontal line to the sloped one,the smaller the p-value for testing H0: j=0.
Highlighting a point on a JMP partial residual plot makes it bigger on the other plots and highlights it in the data table(for examination or,for example,potential exclusion).We can at least on the plots e which points are…t poorly in a model that excludes a given predictor and the e¤ect the addition of that last predictor has on the prediction of that y:(Note that points near the center of the horizontal scale are ones that have x j that can already be predicted from the other x’s and so addition of x j to the prediction equation does not much change the residual.Points far to the right or left of center have values of predictor j that are unlike their predictions from the other x’s.They both tend to more strongly in‡uence the nature of the change in the model predictions as x j is added to the model,and tend to have their residuals more strongly a¤ected than points in the middle of the plot(where x j might be predicted from the other x’s). Leverage
The notion of how much potential in‡uence a single data point has on a…t is an important one. The JMP partial residual plot/"e¤ect leverage"plot is aimed at addressing this issue by highlighting points with large x j partial residuals.Another notion of the same kind is bad on the fact that there
2
are n2numbers h ii0(i=1;:::;n and i0=1;:::;n)depending upon the n vectors(x1i;x2i;:::;x ki) only(and not the y’s)so that each b y i is
b y i=h i1y1+h i2y2+ h i;i 1y i 1+h ii y i+h i;i+1y i+1+ +h in y n
h ii is then somehow a measure of how heavily y i is counted in its own prediction and is usually called the leverage corresponding data point.(JMP calls the of the h ii the"hats.")It is a fact that0<h ii<1and P n i=1h ii=k+1.So the h ii’s average to(k+1)=n,and a plausible rule of thumb is that when a single h ii is more than twice this average value,the corresponding data point has an important(x1i;x2i;:::;x ki).
It is not at all obvious,but as it turns out,the i th deleted residual is e(i)=e i=(1 h ii)and the P RESS statistic has the formula P RESS=P n i=1 e i1 h ii 2involving the leverage values.This shows that big P RESS occurs when big leverages are associated with large ordinary residuals. Cook’s D
The leverage h ii involves only predictors and no y’s.A proposal by Cook to measure the overall e¤ect that ca i has on the regression is the statistic
D i=
h ii
(k+1)MSE e i1 h ii
2=h ii(k+1) e(i)s 2
(abbreviating
p MSE as s)where large values of this identify points that by virtue of either their leverage or their large(ordinary or)deleted residual are"in‡uential."
D i is Cook’s Distance.The cond expression for D i is product of two ratios.The…rst of the is a"fraction of the overall total leverage due to i th ca"and the cond is an approximation to the square of a standardized version of the i th deleted residual.So D i will be large if ca i is located near the"edge"of the data t in terms of the values of the predictors,AND has a y that is poorly predicted if ca i is not included in the model…tting.
3