interactions_and_interpretion_interaction_in_multiple_regression

更新时间:2023-06-11 10:11:25 阅读: 评论:0

Modeling and Interpreting Interactions in Multiple Regression
Donald F. Burrill
The Ontario Institute for Studies in Education
Toronto, Ontario Canada
A method of constructing interactions in multiple regression models is described which produces interaction variables that are uncorrelated with their component variables and with any lower-order interaction variables. The method is, in esnce, a partial Gram-Schmidt orthogonalization that makes u of standard regression procedures, requiring neither special programming nor the u of special-purpo programs before proceeding with the analysis. Advantages of the method include clarity of tests of regression coefficients, and efficiency of winnowing out uninformative predictors (in the form of interactions) in reducing a full model to a satisfactory reduced model. The method is illustrated by applying it to a convenient data t.
PRELIMINARIES
In a linear model reprenting the variation in a dependent variable Y as a linear function of veral ex
planatory variables, interaction between two explanatory variables X and W can be reprented by their product: that is, by the variable created by multiplying them together. Algebraically such a model is reprented by Equation [1]:
Y = a +b1X + b2 W + b3 XW + e . [1]
When X and W are category systems, Eq. [1] describes a two-way analysis of variance (AOV) model; when X and W are (quasi-)continuous variables, Eq. [1] describes a multiple linear regression (MLR) model.
In AOV contexts, the existence of an interaction can be described as a difference between differences: the difference in means between two levels of X at one value of W is not the same as the difference in the corresponding means at another value of W, and this not-the-same-ness constitutes the interaction between X and W; it is quantified by the value of b3.
In MLR contexts, an interaction implies a change in the slope (of the regression of Y on X) from one value of W to another value of W (or, equivalently, a change in the slope of the regression of Y on W for different values of X): in a two-predictor regression with interaction, the respon surface is not a plane but a twisted surface (like "a bent cookie tin", in Darlington's (1990) phra). The change of slo
notonlybutalso
pe is quantified by the value of b 3.
threadINTRODUCTION
In attempting to fit a model (like Eq. [1]) to a t of data, we may proceed in either of two basic ways:
memories是什么意思1.Start with a model that contains all available candidates as predictors, then
simplify the model by discarding candidates that do not contribute to explaining
the variability in the dependent variable; or
2.Start with a simple model and elaborate on it by adding additional candidates.
In either ca we will wish (at any stage in the analysis) to compare a "full model" to a "reduced model", following the usage introduced by Bottenberg & Ward, 1963 (or an "augmented model" to a "compact model", in Judd & McClelland's (1989) usage). If the difference in variance explained is negligible, we will prefer the reduced model and may consider simplifying it further. If the difference is large enough to be interesting, we suspect the reduced model to be oversimplified and will prefer the full model; we may then wish to consider an intermediate model, or a model even more elaborate than the prent full model.
In our context, the "full model" will initially contain as predictors all the original variables of interest and all possible interactions among them.
六级分值分配Traditionally, all possible interactions are routinely reprented in AOV designs (one may of cour hope that many of them do not exist!), and in computer programs designed to produce AOV output; while interactions of any kind are routinely not reprented in MLR designs, and in general have to be explicitly constructed (or at least explicitly reprented) in computer programs designed to produce multiple regression analys. This may be due in part to the fact that values of the explanatory variables (commonly called "factors") in AOV are constrained to a small number of nicely spaced values, so that (for balanced AOV designs) the factors themlves are mutually orthogonal, and their products (interaction effects) are orthogonal to them.
Explanatory variables (commonly called "predictors") in MLR, on the other hand, are usually not much constrained, and are ldom orthogonal to each other, let alone to their products. One conquence of this is that product variables (like XW) tend to be correlated rather strongly with the simple variables that define them: Darlington (1990, Sec. 13.5.6) points out that the products and squares of raw predictors in a multiple regression analysis are often highly correlated with each other, and with the original predictors (also called "linear effects"). This is ldom a difficult problem
with simple models like Eq. [1], but as the number of raw predictors increas the potential number of product variables (to reprent three-way interactions like VWX, four-way interactions like UVWX, and so on) increas exponentially; and the intercorrelations of raw product variables with other variables tend to increa as the number of simple variables in the product increas.
As a result, more complex models tend to exhibit multicollinearity, even though the idea of an interaction is logically independent of the simple variables (and lower-order interactions) to which it is related. This phenomenon may reasonably be called spurious multicollinearity . The point of this paper is that spurious multicollinearity can be made to vanish, permitting the investigator to detect interaction effects (if they exist) uncontaminated by such artifacts.
adulteryThe high intercorrelations lead to veral difficulties:
1.The t of predictors and all their implied interactions (in a "full model") may
explain an impressive amount of the variance of the dependent variable Y, while
none of the regression coefficients are significantly different from zero.
2.The regression solution may be unstable, due to extremely low tolerances (or
extremely high variance inflation factors (VIFs)) for some or all of the predictors.
3.As a corollary of (2.), the computing package ud may refu to fit the full
model.
An example illustrating all of the characteristics is displayed in Exhibit 1.
EXHIBIT 1
英孚培训In this example four raw variables (P1, G, K, S) and their interactions (calculated as the raw products of the corresponding variables) are ud to predict the dependent variable (P2). P1 and P2 are continuous variables (pul rates before and after a treatment); G, K, and S are dichotomies coded [1,2]: G indicates treatment (1 = experimental, 2 = control); K indicates smoking habits (1 = smoker, 2 = non-smoker); S indicates x (1 = male, 2 = female).
The computations were carried out in Minitab. (Similar results occur in other statistical computing packages.) The first output from the regression command (calling for 15 predictors) was * P1.G.S.K is highly correlated with other X variables * P1.G.S.K has been removed from the equation followed by * NOTE * P1 is highly correlated with other predictor variables and a similar message for each of turguide
he other predictors remaining in the equation. The values of the regression coefficients, their standard errors, t-ratios, p-values, and variance inflation factors (VIF) are displayed in the table below, followed by the analysis of variance table.
Standard
Predictor  Coefficient      error        t        p      VIF
Constant        131.7        111.7        1.18    0.242
P1                -1.345        1.537    -0.87    0.385    440.6
G                -38.51        51.12      -0.75    0.454    957.6
K                -79.86        65.49      -1.22    0.226    1412.1
S                19.63        63.00      0.31    0.756    1454.4
G.S              -26.29        38.88      -0.68    0.501    2906.0
G.K              21.96        24.36      0.90    0.370    1230.8
S.K              22.49        31.64      0.71    0.479    1953.4
G.S.K              1.542        9.798      0.16    0.875    842.6
P1.G              0.8671      0.6807    1.27    0.207    1101.9
P1.K              1.2570      0.9498    1.32    0.190    1845.6
P1.S              0.3258      0.7536    0.43    0.667    1673.4
P1.G.S            0.0113      0.3787    0.03    0.976    1784.3
P1.G.K            -0.4236      0.3788    -1.12    0.267    1663.1
P1.S.K            -0.2912      0.4085    -0.71    0.478    2078.3
Source    DF      SS        MS      F      p
Regression    14    22034.0  1573.9  26.60  0.000
Error        77    4556.0    59.2
Total        91    26590.0
2                2
s = 7.692      R  = 82.9%      R  (adj) = 79.8%/ ORTHOGONALIZED PREDICTORS
The difficulties can be avoided entirely by orthogonalizing the product and power terms with respect to the linear effects from which they are constructed. This point is discusd in some detail (with respect to predictors in general) in Chapter 5 of Draper and Smith (1966, 1981), and the Gram-Schmidt orthogonalizing procedure is described in their Sec. 5.7. Becau that discussion is couched in matrix algebra, it is largely inaccessible to anyone who lacks a strong mathematical background. Also, they write in terms of orthogonalizing the whole X matrix; but in fact a partial orthogonalization will often suffice.
In prenting the Gram-Schmidt procedure Draper and Smith (ibid.) obrve that the predictors can be ordered in importance, as least in principle -- that is, the investigator may be interested first in the effect attributable to X1 , then to the additional variance that can be explained by X2 , then to whatever increment is due to X3, and so on. For the example with which they illustrate the procedure (generating orthogonal polynomials), this assumption is reasonable.
macintosh
However, the investigator may not always have (or be willing to impo) a strict a priori ordering on all the predictors. Suppo that we have four predictors U, V, W, X, which are moderately intercorrelated; and that we are interested in a model that includes all the two-way interactions between them, all the three-way interactions, and the four-way interaction. Now a natural ordering begins to emerge, but only a partial one: we will wish to e what effects are attributable to the linear terms alone, then what additional effects are due to the two-way interactions terms, then the three-way terms, and so on. In
general, we are unlikely to be interested in retaining (e.g.) two-way interactions in the final model unless they provide an improvement over a model containing the original variables alone.
A quence of non-orthogonalized models.
One way of proceeding in such a ca is to fit veral models in a hierarchical quence of formal models, using as predictors:
1.The original variables only.
2.The original variables and the two-way interactions.
it教育
3.The original variables and the two- and three-way interactions.
4.The original variables and all interactions.
Then make the usual comparison between models (change in sum of squares divided by change in degrees of freedom, and that ratio divided by the residual mean square for the more elaborate model, for an F-test of the hypothesis that the additional terms did not explain any more of the variance in the dependent variable).
One drawback to proceeding thus is that not all statistical packages will perform this F-test automatically, leaving the investigator to work it out on his own. Another drawback is that, if the three-way interaction terms (for example) do add significantly to the variance explained, it is then necessary to remove them (or add them, depending on the starting model ud) one at a time in successive models, to find out precily which of them is (or are) producing the effect.
This procedure, which is recommended by many authors (e.g., Aiken & West 1991), requires a ries of regression analys. If, as may well be expected, the interactions are strongly correlated with the linear effects (the original variables) or with each other, there still may be some lurking ambiguity in interpreting the regression coefficients.
A single orthogonalized model.
However, if all of the interactions have been orthogonalized with respect to the lower-order terms, one need only fit the full model. Then the standard t-tests of the regression coefficients will indicate directly which predictors (original variables and interactions) contribute significantly to explaining variance in Y, and which do not, and which (if any) are borderline cas for which some further investigation may be uful.
档案的英语
By "orthogonalized with respect to the lower-order terms" I mean that each interaction variable (originally the raw product of the corresponding original variables) is reprented by the residual part of that product, after the original variables and any lower-order interaction variables have been partialed out of it. Conquently every such variable correlates zero with all the lower-order variables, and may be thought of as a "pure interaction" effect at its own level.

本文发布于:2023-06-11 10:11:25,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/928766.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:分配   分值
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图