Variance inflation factor
In statistics, the variance inflation factor (VIF) quantifies the verity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is incread becau of collinearity.
A measure of the amount of multicollinearity in a t of multiple regressionvariables. The prence of multicollinearity within the t of independent variables can cau a number of problems in the understanding the significance 鬼字开头的成语of individual independent variables in the regression model. Using variance inflation factors helps to identify multicollinearity issues so that the model can be adjusted.
Investopedia Says:
The variance inflation factor allows a quick measure of how much a variable is contributing to the standard error in the regression. When significant multicollinearity issues exist, the va
riance inflation factor will be very large for the variables involved. After the variables are identified, there are veral approaches that can be ud to eliminate or combine collinear variables, resolving the multicollinearity issue.
Definition
Consider the following linear model with k independent variables:
Y = β0 + β1 X1 +如何做奶茶 β2 X 2 + ... + βk X重影点k + ε.
The standard error of the estimate of βj is the square root of the j+1, j+1 element of s2(X′X)−1, where s is the standard error of the estimate (SEE) (note that SEE2 is an unbiad estimator of the true variance of the error term, σ2); Xis the regression design matrix — a matrix such that Xi, j+1 is the value of thejth independent variable for the ith ca or obrvation, and such that闺蜜头像动漫可爱 Xi, 1 equals 1 for all i. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expresd as
where R滑雪场规划j2 is the multiple R2 for the regression of Xj on the other covariates (a regression that does not involve the respon variable Y). This identity parates the influences of veral distinct factors on the variance of the coefficient estimate:
∙ s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates
∙ n: greater sample size results in proportionately less variance in the coefficient estimates
∙ : greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate
The remaining term, 1 / (1 − Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector Xj is orthogonal
to each column of the design matrix for the regression of Xj on the other covariates. By contrast, the VIF is greater than 1 when the vector Xj is not orthogonal to all columns of the design matrix for the regression of Xj on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables (that is, we could scale each variable Xj by a constant cj without changing the VIF).
Calculation and analysis
The VIF can be calculated and analyzed in three steps:
Step one
Calculate k different VIFs, one for each Xi by first running an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation.完善制度
If i = 1, for example, the equation would be
where c0 is a constant and e is the error term.儿童补血
Step two
Then, calculate the VIF factor for with the following formula:
where R2i is the coefficient of determination of the regression equation in step one.
Step three
Analyze the magnitude of multicollinearity by considering the size of the . A common rule of thumb is that if 系统瘦身
then multicollinearity is high. Also 10 has been propod (e Kutner book referenced below) as a cut off value.
Some software calculates the tolerance which is just the reciprocal of the VIF. The choice of which to u is a matter of personal preference of the rearcher.
Interpretation
The square root of the variance inflation factor tells you how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other independent variables in the equation.
Example
If the variance inflation factor of an independent variable were
5.27 (√5.27 = 2.3) this means that the standard error for the coefficient of that independent variable is 2.3 times as large as it would be if that independent variable were uncorrelated with the other independent variables.
References
∙ Longnecker, M.T & Ott, R.L :A First Cour in Statistical Methods, page 615. Thomson Brooks/Cole, 2004.
∙ Studenmund, A.H: Using Econometrics: A practical guide, 5th Edition, page 258–259. Pearson International Edition, 2006.
∙ Hair JF, Anderson R, Tatham RL, Black WC: Multivariate Data Analysis. Prentice Hall: Upper Saddle River, N.J. 2006.