interactions_and_interpretion_interaction_in_multiple_regression

更新时间:2023-06-11 10:11:25 阅读：20 评论：0

Modeling and Interpreting Interactions in Multiple Regression

Donald F. Burrill

The Ontario Institute for Studies in Education

Toronto, Ontario Canada

A method of constructing interactions in multiple regression models is described which produces interaction variables that are uncorrelated with their component variables and with any lower-order interaction variables. The method is, in esnce, a partial Gram-Schmidt orthogonalization that makes u of standard regression procedures, requiring neither special programming nor the u of special-purpo programs before proceeding with the analysis. Advantages of the method include clarity of tests of regression coefficients, and efficiency of winnowing out uninformative predictors (in the form of interactions) in reducing a full model to a satisfactory reduced model. The method is illustrated by applying it to a convenient data t.

PRELIMINARIES

In a linear model reprenting the variation in a dependent variable Y as a linear function of veral ex

planatory variables, interaction between two explanatory variables X and W can be reprented by their product: that is, by the variable created by multiplying them together. Algebraically such a model is reprented by Equation [1]:

Y = a +b1X + b2 W + b3 XW + e . [1]

When X and W are category systems, Eq. [1] describes a two-way analysis of variance (AOV) model; when X and W are (quasi-)continuous variables, Eq. [1] describes a multiple linear regression (MLR) model.

In AOV contexts, the existence of an interaction can be described as a difference between differences: the difference in means between two levels of X at one value of W is not the same as the difference in the corresponding means at another value of W, and this not-the-same-ness constitutes the interaction between X and W; it is quantified by the value of b3.

In MLR contexts, an interaction implies a change in the slope (of the regression of Y on X) from one value of W to another value of W (or, equivalently, a change in the slope of the regression of Y on W for different values of X): in a two-predictor regression with interaction, the respon surface is not a plane but a twisted surface (like "a bent cookie tin", in Darlington's (1990) phra). The change of slo

notonlybutalso

pe is quantified by the value of b 3.

threadINTRODUCTION

In attempting to fit a model (like Eq. [1]) to a t of data, we may proceed in either of two basic ways:

memories是什么意思1.Start with a model that contains all available candidates as predictors, then

simplify the model by discarding candidates that do not contribute to explaining

the variability in the dependent variable; or

2.Start with a simple model and elaborate on it by adding additional candidates.

In either ca we will wish (at any stage in the analysis) to compare a "full model" to a "reduced model", following the usage introduced by Bottenberg & Ward, 1963 (or an "augmented model" to a "compact model", in Judd & McClelland's (1989) usage). If the difference in variance explained is negligible, we will prefer the reduced model and may consider simplifying it further. If the difference is large enough to be interesting, we suspect the reduced model to be oversimplified and will prefer the full model; we may then wish to consider an intermediate model, or a model even more elaborate than the prent full model.

In our context, the "full model" will initially contain as predictors all the original variables of interest and all possible interactions among them.

六级分值分配Traditionally, all possible interactions are routinely reprented in AOV designs (one may of cour hope that many of them do not exist!), and in computer programs designed to produce AOV output; while interactions of any kind are routinely not reprented in MLR designs, and in general have to be explicitly constructed (or at least explicitly reprented) in computer programs designed to produce multiple regression analys. This may be due in part to the fact that values of the explanatory variables (commonly called "factors") in AOV are constrained to a small number of nicely spaced values, so that (for balanced AOV designs) the factors themlves are mutually orthogonal, and their products (interaction effects) are orthogonal to them.

Explanatory variables (commonly called "predictors") in MLR, on the other hand, are usually not much constrained, and are ldom orthogonal to each other, let alone to their products. One conquence of this is that product variables (like XW) tend to be correlated rather strongly with the simple variables that define them: Darlington (1990, Sec. 13.5.6) points out that the products and squares of raw predictors in a multiple regression analysis are often highly correlated with each other, and with the original predictors (also called "linear effects"). This is ldom a difficult problem

with simple models like Eq. [1], but as the number of raw predictors increas the potential number of product variables (to reprent three-way interactions like VWX, four-way interactions like UVWX, and so on) increas exponentially; and the intercorrelations of raw product variables with other variables tend to increa as the number of simple variables in the product increas.

As a result, more complex models tend to exhibit multicollinearity, even though the idea of an interaction is logically independent of the simple variables (and lower-order interactions) to which it is related. This phenomenon may reasonably be called spurious multicollinearity . The point of this paper is that spurious multicollinearity can be made to vanish, permitting the investigator to detect interaction effects (if they exist) uncontaminated by such artifacts.

adulteryThe high intercorrelations lead to veral difficulties:

1.The t of predictors and all their implied interactions (in a "full model") may

explain an impressive amount of the variance of the dependent variable Y, while

none of the regression coefficients are significantly different from zero.

2.The regression solution may be unstable, due to extremely low tolerances (or

extremely high variance inflation factors (VIFs)) for some or all of the predictors.

3.As a corollary of (2.), the computing package ud may refu to fit the full

model.

An example illustrating all of the characteristics is displayed in Exhibit 1.

EXHIBIT 1

英孚培训In this example four raw variables (P1, G, K, S) and their interactions (calculated as the raw products of the corresponding variables) are ud to predict the dependent variable (P2). P1 and P2 are continuous variables (pul rates before and after a treatment); G, K, and S are dichotomies coded [1,2]: G indicates treatment (1 = experimental, 2 = control); K indicates smoking habits (1 = smoker, 2 = non-smoker); S indicates x (1 = male, 2 = female).

The computations were carried out in Minitab. (Similar results occur in other statistical computing packages.) The first output from the regression command (calling for 15 predictors) was * P1.G.S.K is highly correlated with other X variables * P1.G.S.K has been removed from the equation followed by * NOTE * P1 is highly correlated with other predictor variables and a similar message for each of turguide

he other predictors remaining in the equation. The values of the regression coefficients, their standard errors, t-ratios, p-values, and variance inflation factors (VIF) are displayed in the table below, followed by the analysis of variance table.

Standard

Predictor Coefficient error t p VIF

Constant 131.7 111.7 1.18 0.242

P1 -1.345 1.537 -0.87 0.385 440.6

G -38.51 51.12 -0.75 0.454 957.6

K -79.86 65.49 -1.22 0.226 1412.1

S 19.63 63.00 0.31 0.756 1454.4

G.S -26.29 38.88 -0.68 0.501 2906.0

G.K 21.96 24.36 0.90 0.370 1230.8

S.K 22.49 31.64 0.71 0.479 1953.4

G.S.K 1.542 9.798 0.16 0.875 842.6

P1.G 0.8671 0.6807 1.27 0.207 1101.9

P1.K 1.2570 0.9498 1.32 0.190 1845.6

P1.S 0.3258 0.7536 0.43 0.667 1673.4

P1.G.S 0.0113 0.3787 0.03 0.976 1784.3

P1.G.K -0.4236 0.3788 -1.12 0.267 1663.1

P1.S.K -0.2912 0.4085 -0.71 0.478 2078.3

Source DF SS MS F p

Regression 14 22034.0 1573.9 26.60 0.000

Error 77 4556.0 59.2

Total 91 26590.0

2 2

s = 7.692 R = 82.9% R (adj) = 79.8%/ ORTHOGONALIZED PREDICTORS

The difficulties can be avoided entirely by orthogonalizing the product and power terms with respect to the linear effects from which they are constructed. This point is discusd in some detail (with respect to predictors in general) in Chapter 5 of Draper and Smith (1966, 1981), and the Gram-Schmidt orthogonalizing procedure is described in their Sec. 5.7. Becau that discussion is couched in matrix algebra, it is largely inaccessible to anyone who lacks a strong mathematical background. Also, they write in terms of orthogonalizing the whole X matrix; but in fact a partial orthogonalization will often suffice.

In prenting the Gram-Schmidt procedure Draper and Smith (ibid.) obrve that the predictors can be ordered in importance, as least in principle -- that is, the investigator may be interested first in the effect attributable to X1 , then to the additional variance that can be explained by X2 , then to whatever increment is due to X3, and so on. For the example with which they illustrate the procedure (generating orthogonal polynomials), this assumption is reasonable.

macintosh

However, the investigator may not always have (or be willing to impo) a strict a priori ordering on all the predictors. Suppo that we have four predictors U, V, W, X, which are moderately intercorrelated; and that we are interested in a model that includes all the two-way interactions between them, all the three-way interactions, and the four-way interaction. Now a natural ordering begins to emerge, but only a partial one: we will wish to e what effects are attributable to the linear terms alone, then what additional effects are due to the two-way interactions terms, then the three-way terms, and so on. In

general, we are unlikely to be interested in retaining (e.g.) two-way interactions in the final model unless they provide an improvement over a model containing the original variables alone.

A quence of non-orthogonalized models.

One way of proceeding in such a ca is to fit veral models in a hierarchical quence of formal models, using as predictors:

1.The original variables only.

2.The original variables and the two-way interactions.

it教育

3.The original variables and the two- and three-way interactions.

4.The original variables and all interactions.

Then make the usual comparison between models (change in sum of squares divided by change in degrees of freedom, and that ratio divided by the residual mean square for the more elaborate model, for an F-test of the hypothesis that the additional terms did not explain any more of the variance in the dependent variable).

One drawback to proceeding thus is that not all statistical packages will perform this F-test automatically, leaving the investigator to work it out on his own. Another drawback is that, if the three-way interaction terms (for example) do add significantly to the variance explained, it is then necessary to remove them (or add them, depending on the starting model ud) one at a time in successive models, to find out precily which of them is (or are) producing the effect.

This procedure, which is recommended by many authors (e.g., Aiken & West 1991), requires a ries of regression analys. If, as may well be expected, the interactions are strongly correlated with the linear effects (the original variables) or with each other, there still may be some lurking ambiguity in interpreting the regression coefficients.

A single orthogonalized model.

However, if all of the interactions have been orthogonalized with respect to the lower-order terms, one need only fit the full model. Then the standard t-tests of the regression coefficients will indicate directly which predictors (original variables and interactions) contribute significantly to explaining variance in Y, and which do not, and which (if any) are borderline cas for which some further investigation may be uful.

档案的英语

By "orthogonalized with respect to the lower-order terms" I mean that each interaction variable (originally the raw product of the corresponding original variables) is reprented by the residual part of that product, after the original variables and any lower-order interaction variables have been partialed out of it. Conquently every such variable correlates zero with all the lower-order variables, and may be thought of as a "pure interaction" effect at its own level.

本文发布于:2023-06-11 10:11:25，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/928766.html

上一篇：ieee撰写规范PaperTemplate-8.5x11

下一篇：Reinforcement Learning in Repeated

标签：分配分值

留言与评论（共有 0 条评论）