Moderated Regression

Draft last revised Thursday, October 22, 2009

*** These notes are still in "Draft" stage ***

Consider the classic mediational model (Baron & Kenny, 1986), that of the independent variable predicting the dependent variable "through" a hypothesized mediator:

IV
- MEDIATOR - DV

Suppose that we would like to
learn whether the above mediational model holds across levels of a
moderator variable that has 5 mutually exclusive categories. That is,
we would like to learn whether the mediational model interacts with
another variable. To test this, we can use a macro
in SPSS that will provide us with conditional indirect effects, and
will inform us of the levels of the moderator for which the
mediational
model holds or does not hold. Testing the above model using
an SEM package such as AMOS or LISREL would provide us with an even
richer analysis. In the
current notes however, we will consider how to address various
questions of moderation that arise in the above model, as a first
and somewhat primitive attempt to tease apart various aspects of the
mediational model, and
look at them separately.

For instance, suppose we find evidence that mediation holds across all levels of the moderator. Although this is an interesting finding, we may want to "dig in" a bit in a post-hoc sense to learn more about the above model's paths across varying levels of the moderator. Again, this is best handled by SEM, but it is very instructive to see how this problem can be at least partially addressed (or approached) using moderated regression where the moderator is a categorical variable (5 mutually exclusive categories). The remainder of these notes discuss how to conduct moderated regression. Consider the following question:

Q: Does the IV predict the mediator consistently at each level of the moderating variable?

The above question is one of an interaction between the IV and the hypothesized moderator. The mediator is now the dependent variable. So, in brief, our function statement (where "E" is error) is the following:

Mediator (Y) = IV + Z + IV*Z + E.

Recall that the mediator is now "Y," our dependent variable. We are hypothesizing that the mediator is a function of the IV, the moderator Z, and the product term IV*Z. If we do find evidence for an interaction, it will inform us that the path from IV to MEDIATOR, considered alone, is not consistent across the five levels of Z. If the paths are not consistent (i.e., there is evidence of an interaction), then it suggests that the IV predicts the MEDIATOR differentially across levels of Z. Note however too that it is possible that the IV does NOT predict the mediator at all at some of these levels, but recall that one condition for testing mediation in the first place is that the paths from IV to MED are bivariate statistically significant. But, just because the paths are statistically significant does not necessarily mean they are constant across levels of the moderator. That is, we may expect the path from IV to MED to be statistically significant at each level of the moderator, but the interaction is going to tell us whether the paths (i.e., slopes) change across levels of the moderator. Let's look at how we would test the interaction term for IV predicting MED, with Z as a hypothesized moderator. The following details how to test moderated regression with one continuous DV, one continuous IV, and one categorical moderator (we'll use a variable with 5 mutually exclusive categories for this example).

How to Test Moderation in SPSS when the DV and IV are Continuous, but the Moderator is Categorical

The question we asked above, that of whether the IV predicts the mediator differentially across levels of the moderator is actually a classic question of moderated regression. For this example then, we are simply treating the mediator as the DV, without considering all simultaneous equations implied in the mediational model. Since the moderator is categorical in nature, we need to produce dummy-coded variables that represent the levels of the moderator. However, recall that when dummy-coding, we will always produce J - 1 categories, where J is the number of levels of the moderator. Then, we'll cross each coded level of the moderator with the continuous independent variable X. Before we show an example, it may be of use to forecast what the output of our regression will eventually look like. Here are the terms we want to test, and the information provided by each term (see below). Recall that when dummy-coding, we need to choose a reference group, that's the group that is NOT represented in the coding. Here are the terms we can expect then from our regression output once we implement the analysis:

X

Z1

Z2

Z3

Z4

X*Z1

X*Z2

X*Z3

X*Z4

The above are the terms we want to test. What do they mean? Let's break them down to know what conclusions we'll be able to draw (and not draw) from the ensuing regression that we will run:

X - this is the continuous IV, so we'll be able to conclude whether X predicts Y while holding Z at 0, just as we would in an ordinary multiple regression. Note that Z contains 5 levels, but is represented fully by Z1 through Z4. The effect of X is actually a simple slope, because it evaluates the slope of Y on X when Z = 0.

Z1 - this is the first of four Z variables, and represents a given level for the coded variable; the coefficient for Z1 represents a mean difference between the level coded as Z = 1 and the baseline category (which we'll identify as Z = 0). Again, choose your reference category wisely - it will be the category for which you would like to make comparisons against.

Z2 - this is the second of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z2 represents a mean difference between the level coded as Z = 2 and the baseline category. Notice again that just as for Z1, the obtained coefficient is providing us with a comparison, the comparison being between means.

Z3 - this is the third of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z3 represents a mean difference between the level coded as Z = 3 and the reference category.

Z4 - this is the fourth of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z4 represents a mean difference between the level coded as Z = 4 and the reference category.

X*Z1 - the coefficient for this represents the difference in slopes between Y on X at Z = 1 and Y on X at Z = 0 (the reference category). We will examine this coefficient in some detail later once we obtain our results using fictitious data. We'll also plot the different slopes to visualize the effect.

X*Z2 - the coefficient for this represents the difference in slopes between Y on X at Z = 2 and Y on X at Z = 0 (the reference category). Again, as was true for X*Z1, X*Z2 represents a difference in slopes.

X*Z3 - the coefficient for this represents the difference in slopes between Y on X at Z = 3 and Y on X at Z = 0 (the reference category).

X*Z4 - the coefficient for this represents the difference in slopes between Y on X at Z = 4 and Y on X at Z = 0 (the reference category).

Example Using Fictitious Data

Let's use fictitious data to illustrate the above moderation. Let's run things on 10 cases per group only. Here's how the data should look when entered into SPSS:

Notice that the Z variable (i.e., the moderator) has 5-1 = 4 columns. Subjects 1 through 10 are in group 1 of the moderating variable. Subjects 11 through 20 are in group 2 of the moderating variable. Subjects 21 through 30 are in group 3 of the moderating variable. Subjects 31 through 40 are in group 4 of the moderating variable. Finally, subjects 41 through 50 are in group 5 of the moderating variable. What is group 5? It is the reference category. Your choice of a reference category should reflect a kind of "baseline" group against which you're interested in making mean pairwise comparisons. For instance, when we run the analysis, the regression coefficient for Z = 1 will reflect the mean comparison between those subjects in the reference group (Z = 0) to those subjects in the Z = 1 group. Likewise, we'll get another comparison between those subjects in the reference group compared to those subjects in the Z = 2 group. And so on.

Producing the Product-Terms

Next, we need to produce the relevant product terms. We ask SPSS to compute product terms of X with EACH coded Z variable as follows:

We've now created all the relevant product terms of X with the moderator Z. All possible interaction terms are included. If you're wondering why we didn't cross Z1 with Z2, for instance, that would be like crossing part of a variable with itself, so it's not do-able. The 4 categories of the dummy variable represent the 5 groups, and all interaction terms produced represent all the possible slope memberships of Y on X. We're ready to now run the analysis.

When you enter variables, it should look like this (be sure to enter all Z1 through Z4 to make sure the dummy-coded variable is properly represented):

When we run the regression, we get the following for output:

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT DV

/METHOD=ENTER X Z1 Z2 Z3 Z4 X_Z1 X_Z2 X_Z3 X_Z4.

We're first shown the variables entered/removed. We've entered all variables, so all looks good in the following:

Next, we get a summary of the model:

Notice that the model explains almost 41% of the variance in the dependent variable Y. We're not using that large of a sample size, so adjusted R-square is "punishing" us a bit, and bringing our explained variance down.

The model is statistically significant at p < .01 (F on 9 and 40 degrees of freedom equals 3.042, p = .007).

Next up, we're given the parameter estimates. The following table, and variations thereof, will consume our discussion for much of the remainder of these notes (we have an arrow pointing to ".812," and we'll explain this coefficient a bit later in the notes):

Let's interpret each and every one of the parameter estimates to indicate exactly what they mean:

X - as X increases by one unit, the expected change in Y is -.524 units when Z = 0 (as represented by the coded dummy variables). This interpretation is similar to that in ordinary least-squares regression. However, notice that we had to specify "when Z = 0." This is key. The effect of X is actually a simple slope because it represents Y on X when Z = 0. It looks like a "main effect" above, but it is actually a simple slope. If it were a traditional main effect, then the interpretation would be "as X increases by one unit, the expected change in Y is -.524 units across X." It is very important to note that when interaction terms are present, the meaning of main effect terms change, as in the current case where we have the simple slope for when Z = 0.

Z1 - the mean difference between the group coded 0 and the group coded 1 is equal to -4.549. Although it represents a mean difference, we can actually interpret it "regression-style" to clarify its meaning. That is, as we go from Z = 0 to Z = 1, the expected change in the Y variable is a decrease of 4.549 units. In other words, the expected mean of the group coded 0 is 4.549 units more than the mean of the group coded 1. Let's write out the equations to see this a bit better. For an individual in group 0, and having X = 0, the predicted score is:

Y = 8.190 -.524(X) -4.549(Z1)

= 8.190 -.524(0) -4.549(0)

= 8.190

Now, if that person were in group 1, but still having X = 0, we would have:

Y = 8.190 -.524(X) - 4.549(Z1)

= 8.190 -.524(0) - 4.549(1)

= 8.190 -4.549

= 3.641

The numbers 8.190 and 3.641 are predicted values (they are also means) for group 0 and group 1 respectively, given X = 0. Notice that as we go from group 0 to group 1, the predicted value drops by a magnitude of 4.549 units. This is exactly what the coefficient for Z1 is telling us.

We're tempted to verify this through a simpler analysis to make sure it's correct. Let's try to verify it. Let's calculate the mean of each group Z = 0 vs. Z = 1:

USE ALL.

COMPUTE filter_$=(Z1 = 1).

VARIABLE LABEL filter_$ 'Z1 = 1 (FILTER)'.

VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.

FORMAT filter_$ (f1.0).

FILTER BY filter_$.

EXECUTE .

Notice that the mean for group 1 is equal to 5.2. What is the mean for group 0?

FILTER OFF.

use 41 thru 50.

EXECUTE.

Why is the mean difference that of 5.2 - 5.1 = 0.1 and not -4.549 as the coefficient suggested? It is because the difference of 0.1 does not take into consideration the effect of partialling out X. Let's do a mini-analysis in which we do not partial out X from the difference and see if it matches up to 0.1. It should. We will only include the Z variable in our analysis (as to ignore the influence of X):

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT DV

/METHOD=ENTER Z1 Z2 Z3 Z4.

The results from the above are the following:

Notice the coefficient for Z1, it is equal to .100, the exact same difference between means as we found previously when we did not partial out X. Notice as well that the intercept is equal to 5.100, which is the mean of the group coded 0. The above interpretation is simple, because we don't have X to partial out. As we go from group 0 to group 1, the expected increase in Y is equal to .100 (i.e., 5.100 to 5.200).

Let's return to the interpretation of coefficients, with X included. Recall the output:

We've already interpreted the value for Z1 as the difference between the reference group and the group dummy coded 1, controlling for X. Specifically, we can say that as we move from the reference group to group 1, the expected change in Y is a decrease of 4.549 units, holding X constant. In other words, the difference in means between the reference group and the group coded 1 is equal to 4.549, with the group coded Z = 1 having the lower mean (because the sign of the coefficient is negative). We have to be sure to state this difference in the context of X being held constant (as we'll see later on, the coefficients change when X is not held constant).

For Z2, the interpretation is analogous. As we go from Z0 to Z2, the expected decrease in Y is equal to -5.315. This again is the mean difference between the reference group and the group coded 2, partialling out X. Remember that you MUST interpret these mean differences under the condition that X has been partialled out, otherwise it is not an accurate interpretation of the given coefficient. As we go from Z = 0 to Z = 2, the expected change in Y is a decrease of 5.315.

For Z3, the interpretation is that as we go from Z = 0 to Z = 3, the difference in means is equal to 9.271. That is, the expectation experiences a decrease in means from Z = 0 to Z = 3. Otherwise said, the mean difference between these two groups, controlling for X, is equal to 9.271.

Finally, let's look at Z4 = -5.019. As we go from Z = 0 to Z = 4, the difference in means is equal to -5.019. Note carefully that though we're saying "from Z = 0 to Z = 4," we're not implying any kind of continuity between 0 and 4. We're simply using these as labels to describe the dichotomous situation represented by each interaction coefficient, or otherwise said, the comparison between groups.

Interpreting the Interaction Terms

Let's have a closer look at the interaction term X*Z1 (i.e., the term with the red arrow pointing toward it). What does it represent? It represents a difference in slopes. The difference in slopes is that between Y on X at Z = 0 versus Y on X at Z = 1. Because the coefficient is positive (.812), it means that the slope of Y on X at Z = 1 is .812 greater than the slope of Y on X at Z = 0. If we look at the other interaction terms, we see a similar trend, that there is an increase in the slope of Y on X as we go from Y on X at Z = 0 to Y on X at Z = 2, Z = 3, but then the slope drops a bit at Z = 4. Notice too that the terms are statistically significant, suggesting that these differences in slopes are probably not best explained by sampling error (or losely, "chance"). It would appear there is an actual effect in the population from which these data were presumably sampled.

Visualizing the Simple Slopes

Let's look more closely at the coefficient for X*Z1. It is equal to .812. Again, literally, it means that the slope of Y on X increases (expectantly) by .812 units as we move from the reference group (Z = 0) to the group coded 1 (Z = 1). If we were to simply report our analyses in this way, it would be pretty hollow. We would like to produce a visual display so we can actually "see" the effect. In what follows we conduct two analyses: 1) the regression of Y on X for Z = 0, 2) the regression of Y on X for Z = 1. The difference in regression weights (raw b weights) between analyses should equal the coefficient value of .812 that we observed above, and would should get a powerful visual display of the slope difference. The following analyses should suggest the same result that we got from the above analysis that included interaction terms. Let's get started with the first analysis:

Analysis 1: Y on X when Z = 0.

To get this anlaysis, ask SPSS to select only those cases for which we have zeros for all dummy-coded variables, since this represents the reference group (cases 41 through 50). Recall the original coded data:

The data we want for the first analysis are in cases 41 through 50 (because that's the group for which Z = 0, it's the reference group). We can ask SPSS to select these cases by the command (or just use window commands, it's much more convenient than always typing in syntax when not necessary - we show the syntax to show the procedural steps, since it's sometimes difficult or inconvenient to show steps through window snapshots):

FILTER OFF.

use 41 thru 50.

EXECUTE.

Next, run the regression of Y on X. Because we've selected only those cases that represent the reference group, the ensuing regression will be Y on X for when Z = 0:

Notice the value of -.524 for X. We will use this value in a moment. It represents the expected change in Y for a one-unit increase in X when Z = 0. Notice that the coefficient is not statistically significant. In other words, this "simple slope" is not statistically significant.

Analysis 2: Y on X when Z = 1.

Let's now run the regression of Y on X for when Z = 1. Again, select cases that represent group 1 for membership on the moderating variable Z. For our data, this is accomplished by:

FILTER OFF.

use 1 thru 10.

EXECUTE.

When we then run the regression, we obtain:

Notice the value of .289 for the coefficient for X. It represents the expected change in Y for a one-unit increase in X when Z = 1. Notice that the coefficient is not statistically significant. Again, just as was the case for Z = 0, this "simple slope" is not statistically significant. But nevertheless, what have we just calculated in these two separate regressions? When we subtract -.524 from .289, we obtain .289 - (-.524) = 0.813. What is this number of 0.813? It represents the difference between slopes of Y on X when Z = 0 vs. Y on X when Z = 1, and is identical (within slight rounding error) to the coefficient we found earlier in the full analysis, marked with the arrow in the following:

Notice as well that the slope difference is statistically significant (p = .022) but neither simple slopes is statistically significant as we saw in the above output. This is just fine, and it simply means that neither slope is really doing much alone, but there still is a statistically significant difference between them. Notice as well what the coefficient is actually telling us. It's telling us that as we move from the reference group (Z = 0) to the group coded 1 (Z = 1), the slope INCREASES by .812. If this is true, then we should be able to visualize this in two separate plots to better understand the effect we've found. Let's produce a scatterplot for Y on X when Z = 0:

FILTER OFF.

use 41 thru 50.

EXECUTE.

Notice the direction of the relationship. It's negative. According to the regression coefficient of .812, when we plot for Y on X for Z = 1, we should see a .812 increase in the slope. Let's obtain the plot for Y on X when Z = 1 to visualize this effect:

FILTER OFF.

use 1 thru 10.

EXECUTE.

Notice that the slope has changed (the red slopes are only approximate, they were inputed manually and not fitted exactly according to the regression equation). By how much? By .812 units. That is, as our coefficient told us, we're seeing an increase in the slope of Y on X for Z = 0 of -.524 to a slope of .289 when Z = 1. This difference is of .812 units. Hence, we've visualized what the regression coefficient X*Z1 was telling us in the original analysis. We could do this for all of the product terms to get a feel for the respective interaction terms. As a guideline, whenever you present simple slopes analyses, it's always a good idea to plot the simple slopes following the original analysis with relevant product terms. The visualization provides a powerful way to gain an appreciation of what's actually going on in your data, and undoubtedly your audience is going to want to see these slopes and plots to get a feel for your findings.

---------------------------------------------------------

How to Display the Effects of Z1, Z2, Z3, Z4

[this section is still under construction]

Group n Mean of Y Mean of X

0 10 5.1000 5.9000

1 10 5.2000 5.4000

2 10 5.9000 5.2000

3 10 4.4000 6.3000

4 10 6.3000 6.6000

MM 5.3800 5.8800

Combined 50 5.3800 5.8800

---------------------------------------------------------

Obtaining Predicted Values in SPSS

Let's return to the table of coefficients:

Let's look at the constant of 8.190 (not the arrowed number, but rather the constant at the top of the table). What does this represent? It is the expected Y for an observation when X = 0. Notice that across the 5 groups of the moderator, an increase in X is associated with an expected decrease in Y of -.524 units. Had we centered X, the interpretation would be that the expected value for Y for someone with an average level of X is 8.190. Will we talk about centering shortly.

Now, let's write out the equation for the first observation in group 1. That individual was in group 1 and had X = 6. What is that observation's predicted value?

Y = 8.190 -.524(6) - 4.549(1) + .812(6)

= 8.190 - 3.144 - 4.549 + 4.872

= 5.369

Thus, the predicted value on Y for someone in group 1 who has a value of 6 on X, is 5.369. We can ask SPSS to produce a whole vector of predicted values for our entire data set (we show the first few predicted values in what follows). Notice that our predicted value of 5.369 that we calculated matches up with the predicted value for observation 1 (within rounding error):

How well do our predicted values match up with the observed values? Let's correlate them:

CORRELATIONS

/VARIABLES=Y PRE_2

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

What does this value of .637 represent? It is the multiple R from our analysis, since multiple R is the bivariate correlation between observed and predicted values. Recall the model summary:

Recall that the purpose of regression, no matter how simple or complex, is to test a model that does its best at reproducing the observed data. If the model reproduces the observed data perfectly, we would expect a multiple R of 1.0. Anything less, and the model isn't doing as good of a job.

Writing Out the Model Equations

In order to gain a better appreciation of what the estimated coefficients in the above regression mean exactly, let's write out a few of the equations. Here's the actual regression equation that we've estimated:

Y = 8.190 + (-.524)(X) + (-4.549)(Z1) + (-5.315)(Z2) + (-9.271)(Z3) + (-5.019)(Z4)

+ (.812)(X*Z1) + (1.106)(X*Z2) + (1.394)(X*Z3) + (.998)(X*Z4)

We can better appreciate what each coefficient is telling us if we consider some scenarios. For instance, suppose a given observation has X = 0, and is in group Z = 0. What would this mean for Z1? Because Z1 is not "activated," we would enter zero for it. Similarly for Z2, Z3 and Z4. And since X*Z1 through X*Z4 are indicators of an interaction with Z, these would all be zero as well. So, we would have:

Y = 8.190 + (-.524)(0) + (-4.549)(0) + (-5.315)(0) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 8.190

So, for an observation with zero on X, and in the reference group (Z = 0), the predicted value on Y is equal to 8.190, which is the value of the intercept. Now, suppose an observation has an X score equal to 10, but is still in group Z = 0. The predicted value would be:

Y = 8.190 + (-.524)(10) + (-4.549)(0) + (-5.315)(0) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 2.95

The predicted value for someone having a score of X = 10, and in the reference group (Z = 0) is equal to 2.95. Notice that the "10" on X brought the score down from the intercept value of 8.190. This is because the effect for X has a negative coefficient (controlling for Z). A unit increase in X equals an expected change in Y of -.524 units, and in the above we multiplied -.524 by 10 because X = 10. It's very reasonable then that our predicted Y dropped quite a bit.

We can keep producing predicted values for various combinations in the equation. Let's do one more. Assume the observation has X = 10 again, but instead of being in the reference group, the observation is in group 3 (Z = 3). Then we would have the following, being sure to "activate" Z = 3 (or "indicate" the variable, which is why we call it an indicator):

Y = 8.190 + (-.524)(10) + (-4.549)(0) + (-5.315)(0) + (-9.271)(1) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 8.190 -.524(10) -9.271(1)

= 8.190 -5.24 - 9.271

= -6.321

Let's do one again, this time for an observation actually in our data. Let's take observation 11 in our data. It has a Y value of 8, an X value of 5, is in group Z = 2, and therefore has a product term X*Z2 = 5. It's equation would be the following:

Y = 8.190 + (-.524)(5) + (-4.549)(0) + (-5.315)(1) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(5) + (1.394)(0) + (.998)(0)

= 8.190 -2.62 -5.315 + 5.53

= 5.785

Notice that our answer, within slight rounding error, is the predicted value produced by SPSS (count down to observation 11) What was the *actual* Y? It was 8, so we can say that our model didn't reproduce this data point as well as we would have wanted (but still did a decent job, depending on what the standard error of residuals turns out to be):

Why Bother Calculating Predicted Values Manually?

It may seem like a trivial exercise to calculate a few predicted values manually, but it isn't trivial at all. Rather, it's excellent practice at specifying the actual model equations, and ameliorating our understanding of what the coefficients mean in a relatively complex regression with interaction regressors. For instance, if you were asked what the model equation looked like for someone with X = 0 in the reference group (Z = 0), you'd have no trouble writing it out. Similarly, if you were asked what the model equation looked like for someone with X = 10 and in group 4 (Z = 4), you'd again have little difficulty in writing out the equation. Yes, software will compute predicted values for us, but it's always useful to try a few on your own to make sure you're clear on how these predicted values are being produced, especially when the regression model is relatively complex and involves interaction terms, etc. It also helps you familiarize yourself with the model equations.

Mean Centering the Continuous Predictor X

To aid in the interpretation of parameter estimates, it's helpful to mean center the continuous predictor before running the analysis. The mean of variable X is equal to 5.88. To mean center, we command SPSS to subtract this value from each X data point for each individual:

COMPUTE X_cent = X-5.88.

EXECUTE.

The new values for X are then produced in SPSS:

To verify that SPSS did it correctly, consider the first centered value of .12. It was computed by taking 6 (the X value for observation 1 in our data) minus 5.88, which equals .12. Consider what centering the predictor accomplishes. Recall that previously, the intercept in our regression represented the expected value of Y when X = 0. When we center, the intercept will represent the expected value of Y when X still equals 0, but zero now represents the mean of X, and not truly a zero score on X. To understand this better, consider the centering effect for an observation with a score of 5.88. When we center, we get 0 (5.88-5.88). So, when X_cent = 0, it's actually at the mean of X. Let's produce the relevant product terms using the newly centered X variable:

COMPUTE X_cent_Z1 = X_cent * Z1.

EXECUTE.

COMPUTE X_cent_Z2 = X_cent * Z2.

EXECUTE.

COMPUTE X_cent_Z3 = X_cent * Z3.

EXECUTE.

COMPUTE X_cent_Z4 = X_cent * Z4.

EXECUTE.

The new interaction terms will be produced in SPSS when we take the new products:

The product terms using the centered X variable have now been produced, and we can re-run our regression analysis to learn how to interpret the coefficients when X is centered:

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT Y

/METHOD=ENTER X_cent Z1 Z2 Z3 Z4 X_cent_Z1 X_cent_Z2 X_cent_Z3 X_cent_Z4

/SAVE PRED .

We see that the model R of .637 is identical. This is no surprise, since the centering of X didn't really change anything in terms of predictive power of the model, it simply helps us interpret the coefficients a bit better. By centering, we conducted a linear transformation, and so that R remains constant was entirely expected.

Similarly, the statistical significance of the model is identical to when X was not centered. Again, this is expected. Next up, let's take a look at the coefficients. This is where we notice changes:

Here's where centering X will paid dividends in interpretation. Look at the value for the intercept. It is equal to 5.11. What is this value? It is the predicted value for the reference group for when X = 0, which because we've centered X, means that it is the predicted value for the reference group for when X equals its MEAN (and not actually zero as was true when X was not centered). What is the mean of X? It's original mean (before centering) was 5.88. So, the constant 5.11 means that the predicted value for Y for an individual in the reference group with an average amount of X (X = 0, which means X = 5.88 because it's been mean-centered) is 5.110. How do we know this is the predicted value for the reference group? Because if it's for the reference group, then that implies that all Z will equal 0, and we'll have:

Y = 5.110 -.524(X) + .228(Z1) + 1.185(Z2) + (-1.076)(Z3) + .848(Z4) + .812(X*Z1) + 1.106(X*Z2) + 1.394(X*Z3) + .998(X*Z4)

= 5.110 -.524(0) + .228(0) + 1.185(0) + (-1.076)(0) + .848(0) + .812(0) + 1.106(0) + 1.394(0) + .998(0)

= 5.110

Notice that our predicted Y of 5.110 matches that of the intercept term. Because we centered X, when we input X = 0, we're actually evaluating at the mean of X, rather than when X actually equals 0 such as before when it was not centered. The term "-.524(0)" now says, "at the mean of X".

Let's keep X = 0, but evaluate when Z = 1. The value for Z1 is .228, which means that as we go from group 0 to group 1, the expected increase in Y is .228 units. Realize that this represents a contrast between the reference group and the group coded 1. Let's evaluate the equation:

Y = 5.110 -.524(X) + .228(Z1) + 1.185(Z2) + (-1.076)(Z3) + .848(Z4) + .812(X*Z1) + 1.106(X*Z2) + 1.394(X*Z3) + .998(X*Z4)

= 5.110 -.524(0) + .228(1) + 1.185(0) + (-1.076)(0) + .848(0) + .812(0) + 1.106(0) + 1.394(0) + .998(0)

= 5.110 + .228

= 5.338

Hence, we see that being in group 1 compared to group 0 (reference group) increases the predicted Y by .228, resulting in a predicted value of 5.338. To see the value in this kind of prediction, imagine you were to guess the Y score of a person standing behind a closed door. You know nothing about the person, except that you know they have average X, and are in group 1 of the moderator rather than group 0. You could reason as follows: "A good prediction for someone with average X in the reference group would be 5.110. But, if I know that person is in group 1 rather than group 0, I'm going to increase my estimate by .228 units, for a guess of 5.338. That's my best prediction for a person exhibiting these characteristics."

An Analysis Without Partialling Out X

Consider an analysis in which we do not partial out X, but rather simply analyze the effect of the Z-variable:

What is the constant 5.100? It is the mean of the group coded 0 (the reference group). The Z1 coefficient of .100 indicates that as we go from Z = 0 to Z = 1, the increase in the mean is of the magnitude .100. So, the mean for group 1 is 5.200. The coefficient .800 indicates that as we go from group 0 to group 2, the expected mean increase is of the order .800, indicating that the mean for group 2 is 5.100 + .800 = 5.900. For Z = 3, the coefficient of -.700 indicates that the difference between means in group 0 to group 3 is of the magnitude .700, but this time, because the sign of the coefficient is negative, it represents a decrease in the mean rather than an increase. That is, the mean of group 3 is 5.100 - .700 = 4.400. Finally, for Z = 4, we have a coefficient of 1.200. This means that the expected difference in means between group 0 (reference group) and group 4 is of the order 1.200. Since the sign is positive, it indicates that the mean of group 4 is 1.200 units greater than the mean of 5.100. So, the mean of group 4 is 5.100 + 1.200 = 6.300.

Notice that without partialling out the continuous predictor X, the means derived from the coefficient table match up to the actual means for the groups, as we can easily verify by obtaining descriptives on the group means across levels of Z. Notice the means in the following descriptives are 5.100, 5.200, 5.900, 4.400, 6.300 for Z groups 0, 1, 2, 3, 4 respectively. These group means match up perfectly with the means we figured out from the coefficient table:

**DATA
& DECISION**, Copyright 2010,
Daniel
J. Denis, Ph.D. Department
of Psychology, University of Montana. Contact Daniel
J. Denis by e-mail daniel.denis@umontana.edu.

For instance, suppose we find evidence that mediation holds across all levels of the moderator. Although this is an interesting finding, we may want to "dig in" a bit in a post-hoc sense to learn more about the above model's paths across varying levels of the moderator. Again, this is best handled by SEM, but it is very instructive to see how this problem can be at least partially addressed (or approached) using moderated regression where the moderator is a categorical variable (5 mutually exclusive categories). The remainder of these notes discuss how to conduct moderated regression. Consider the following question:

Q: Does the IV predict the mediator consistently at each level of the moderating variable?

The above question is one of an interaction between the IV and the hypothesized moderator. The mediator is now the dependent variable. So, in brief, our function statement (where "E" is error) is the following:

Mediator (Y) = IV + Z + IV*Z + E.

Recall that the mediator is now "Y," our dependent variable. We are hypothesizing that the mediator is a function of the IV, the moderator Z, and the product term IV*Z. If we do find evidence for an interaction, it will inform us that the path from IV to MEDIATOR, considered alone, is not consistent across the five levels of Z. If the paths are not consistent (i.e., there is evidence of an interaction), then it suggests that the IV predicts the MEDIATOR differentially across levels of Z. Note however too that it is possible that the IV does NOT predict the mediator at all at some of these levels, but recall that one condition for testing mediation in the first place is that the paths from IV to MED are bivariate statistically significant. But, just because the paths are statistically significant does not necessarily mean they are constant across levels of the moderator. That is, we may expect the path from IV to MED to be statistically significant at each level of the moderator, but the interaction is going to tell us whether the paths (i.e., slopes) change across levels of the moderator. Let's look at how we would test the interaction term for IV predicting MED, with Z as a hypothesized moderator. The following details how to test moderated regression with one continuous DV, one continuous IV, and one categorical moderator (we'll use a variable with 5 mutually exclusive categories for this example).

How to Test Moderation in SPSS when the DV and IV are Continuous, but the Moderator is Categorical

The question we asked above, that of whether the IV predicts the mediator differentially across levels of the moderator is actually a classic question of moderated regression. For this example then, we are simply treating the mediator as the DV, without considering all simultaneous equations implied in the mediational model. Since the moderator is categorical in nature, we need to produce dummy-coded variables that represent the levels of the moderator. However, recall that when dummy-coding, we will always produce J - 1 categories, where J is the number of levels of the moderator. Then, we'll cross each coded level of the moderator with the continuous independent variable X. Before we show an example, it may be of use to forecast what the output of our regression will eventually look like. Here are the terms we want to test, and the information provided by each term (see below). Recall that when dummy-coding, we need to choose a reference group, that's the group that is NOT represented in the coding. Here are the terms we can expect then from our regression output once we implement the analysis:

X

Z1

Z2

Z3

Z4

X*Z1

X*Z2

X*Z3

X*Z4

The above are the terms we want to test. What do they mean? Let's break them down to know what conclusions we'll be able to draw (and not draw) from the ensuing regression that we will run:

X - this is the continuous IV, so we'll be able to conclude whether X predicts Y while holding Z at 0, just as we would in an ordinary multiple regression. Note that Z contains 5 levels, but is represented fully by Z1 through Z4. The effect of X is actually a simple slope, because it evaluates the slope of Y on X when Z = 0.

Z1 - this is the first of four Z variables, and represents a given level for the coded variable; the coefficient for Z1 represents a mean difference between the level coded as Z = 1 and the baseline category (which we'll identify as Z = 0). Again, choose your reference category wisely - it will be the category for which you would like to make comparisons against.

Z2 - this is the second of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z2 represents a mean difference between the level coded as Z = 2 and the baseline category. Notice again that just as for Z1, the obtained coefficient is providing us with a comparison, the comparison being between means.

Z3 - this is the third of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z3 represents a mean difference between the level coded as Z = 3 and the reference category.

Z4 - this is the fourth of the four Z variables, and again represents a given level for the coded variable; the coefficient for Z4 represents a mean difference between the level coded as Z = 4 and the reference category.

X*Z1 - the coefficient for this represents the difference in slopes between Y on X at Z = 1 and Y on X at Z = 0 (the reference category). We will examine this coefficient in some detail later once we obtain our results using fictitious data. We'll also plot the different slopes to visualize the effect.

X*Z2 - the coefficient for this represents the difference in slopes between Y on X at Z = 2 and Y on X at Z = 0 (the reference category). Again, as was true for X*Z1, X*Z2 represents a difference in slopes.

X*Z3 - the coefficient for this represents the difference in slopes between Y on X at Z = 3 and Y on X at Z = 0 (the reference category).

X*Z4 - the coefficient for this represents the difference in slopes between Y on X at Z = 4 and Y on X at Z = 0 (the reference category).

Example Using Fictitious Data

Let's use fictitious data to illustrate the above moderation. Let's run things on 10 cases per group only. Here's how the data should look when entered into SPSS:

Notice that the Z variable (i.e., the moderator) has 5-1 = 4 columns. Subjects 1 through 10 are in group 1 of the moderating variable. Subjects 11 through 20 are in group 2 of the moderating variable. Subjects 21 through 30 are in group 3 of the moderating variable. Subjects 31 through 40 are in group 4 of the moderating variable. Finally, subjects 41 through 50 are in group 5 of the moderating variable. What is group 5? It is the reference category. Your choice of a reference category should reflect a kind of "baseline" group against which you're interested in making mean pairwise comparisons. For instance, when we run the analysis, the regression coefficient for Z = 1 will reflect the mean comparison between those subjects in the reference group (Z = 0) to those subjects in the Z = 1 group. Likewise, we'll get another comparison between those subjects in the reference group compared to those subjects in the Z = 2 group. And so on.

Producing the Product-Terms

Next, we need to produce the relevant product terms. We ask SPSS to compute product terms of X with EACH coded Z variable as follows:

We've now created all the relevant product terms of X with the moderator Z. All possible interaction terms are included. If you're wondering why we didn't cross Z1 with Z2, for instance, that would be like crossing part of a variable with itself, so it's not do-able. The 4 categories of the dummy variable represent the 5 groups, and all interaction terms produced represent all the possible slope memberships of Y on X. We're ready to now run the analysis.

When you enter variables, it should look like this (be sure to enter all Z1 through Z4 to make sure the dummy-coded variable is properly represented):

When we run the regression, we get the following for output:

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT DV

/METHOD=ENTER X Z1 Z2 Z3 Z4 X_Z1 X_Z2 X_Z3 X_Z4.

We're first shown the variables entered/removed. We've entered all variables, so all looks good in the following:

Next, we get a summary of the model:

Notice that the model explains almost 41% of the variance in the dependent variable Y. We're not using that large of a sample size, so adjusted R-square is "punishing" us a bit, and bringing our explained variance down.

The model is statistically significant at p < .01 (F on 9 and 40 degrees of freedom equals 3.042, p = .007).

Next up, we're given the parameter estimates. The following table, and variations thereof, will consume our discussion for much of the remainder of these notes (we have an arrow pointing to ".812," and we'll explain this coefficient a bit later in the notes):

Let's interpret each and every one of the parameter estimates to indicate exactly what they mean:

X - as X increases by one unit, the expected change in Y is -.524 units when Z = 0 (as represented by the coded dummy variables). This interpretation is similar to that in ordinary least-squares regression. However, notice that we had to specify "when Z = 0." This is key. The effect of X is actually a simple slope because it represents Y on X when Z = 0. It looks like a "main effect" above, but it is actually a simple slope. If it were a traditional main effect, then the interpretation would be "as X increases by one unit, the expected change in Y is -.524 units across X." It is very important to note that when interaction terms are present, the meaning of main effect terms change, as in the current case where we have the simple slope for when Z = 0.

Z1 - the mean difference between the group coded 0 and the group coded 1 is equal to -4.549. Although it represents a mean difference, we can actually interpret it "regression-style" to clarify its meaning. That is, as we go from Z = 0 to Z = 1, the expected change in the Y variable is a decrease of 4.549 units. In other words, the expected mean of the group coded 0 is 4.549 units more than the mean of the group coded 1. Let's write out the equations to see this a bit better. For an individual in group 0, and having X = 0, the predicted score is:

Y = 8.190 -.524(X) -4.549(Z1)

= 8.190 -.524(0) -4.549(0)

= 8.190

Now, if that person were in group 1, but still having X = 0, we would have:

Y = 8.190 -.524(X) - 4.549(Z1)

= 8.190 -.524(0) - 4.549(1)

= 8.190 -4.549

= 3.641

The numbers 8.190 and 3.641 are predicted values (they are also means) for group 0 and group 1 respectively, given X = 0. Notice that as we go from group 0 to group 1, the predicted value drops by a magnitude of 4.549 units. This is exactly what the coefficient for Z1 is telling us.

We're tempted to verify this through a simpler analysis to make sure it's correct. Let's try to verify it. Let's calculate the mean of each group Z = 0 vs. Z = 1:

USE ALL.

COMPUTE filter_$=(Z1 = 1).

VARIABLE LABEL filter_$ 'Z1 = 1 (FILTER)'.

VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.

FORMAT filter_$ (f1.0).

FILTER BY filter_$.

EXECUTE .

Notice that the mean for group 1 is equal to 5.2. What is the mean for group 0?

FILTER OFF.

use 41 thru 50.

EXECUTE.

Why is the mean difference that of 5.2 - 5.1 = 0.1 and not -4.549 as the coefficient suggested? It is because the difference of 0.1 does not take into consideration the effect of partialling out X. Let's do a mini-analysis in which we do not partial out X from the difference and see if it matches up to 0.1. It should. We will only include the Z variable in our analysis (as to ignore the influence of X):

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT DV

/METHOD=ENTER Z1 Z2 Z3 Z4.

The results from the above are the following:

Notice the coefficient for Z1, it is equal to .100, the exact same difference between means as we found previously when we did not partial out X. Notice as well that the intercept is equal to 5.100, which is the mean of the group coded 0. The above interpretation is simple, because we don't have X to partial out. As we go from group 0 to group 1, the expected increase in Y is equal to .100 (i.e., 5.100 to 5.200).

Let's return to the interpretation of coefficients, with X included. Recall the output:

We've already interpreted the value for Z1 as the difference between the reference group and the group dummy coded 1, controlling for X. Specifically, we can say that as we move from the reference group to group 1, the expected change in Y is a decrease of 4.549 units, holding X constant. In other words, the difference in means between the reference group and the group coded 1 is equal to 4.549, with the group coded Z = 1 having the lower mean (because the sign of the coefficient is negative). We have to be sure to state this difference in the context of X being held constant (as we'll see later on, the coefficients change when X is not held constant).

For Z2, the interpretation is analogous. As we go from Z0 to Z2, the expected decrease in Y is equal to -5.315. This again is the mean difference between the reference group and the group coded 2, partialling out X. Remember that you MUST interpret these mean differences under the condition that X has been partialled out, otherwise it is not an accurate interpretation of the given coefficient. As we go from Z = 0 to Z = 2, the expected change in Y is a decrease of 5.315.

For Z3, the interpretation is that as we go from Z = 0 to Z = 3, the difference in means is equal to 9.271. That is, the expectation experiences a decrease in means from Z = 0 to Z = 3. Otherwise said, the mean difference between these two groups, controlling for X, is equal to 9.271.

Finally, let's look at Z4 = -5.019. As we go from Z = 0 to Z = 4, the difference in means is equal to -5.019. Note carefully that though we're saying "from Z = 0 to Z = 4," we're not implying any kind of continuity between 0 and 4. We're simply using these as labels to describe the dichotomous situation represented by each interaction coefficient, or otherwise said, the comparison between groups.

Interpreting the Interaction Terms

Let's have a closer look at the interaction term X*Z1 (i.e., the term with the red arrow pointing toward it). What does it represent? It represents a difference in slopes. The difference in slopes is that between Y on X at Z = 0 versus Y on X at Z = 1. Because the coefficient is positive (.812), it means that the slope of Y on X at Z = 1 is .812 greater than the slope of Y on X at Z = 0. If we look at the other interaction terms, we see a similar trend, that there is an increase in the slope of Y on X as we go from Y on X at Z = 0 to Y on X at Z = 2, Z = 3, but then the slope drops a bit at Z = 4. Notice too that the terms are statistically significant, suggesting that these differences in slopes are probably not best explained by sampling error (or losely, "chance"). It would appear there is an actual effect in the population from which these data were presumably sampled.

Let's look more closely at the coefficient for X*Z1. It is equal to .812. Again, literally, it means that the slope of Y on X increases (expectantly) by .812 units as we move from the reference group (Z = 0) to the group coded 1 (Z = 1). If we were to simply report our analyses in this way, it would be pretty hollow. We would like to produce a visual display so we can actually "see" the effect. In what follows we conduct two analyses: 1) the regression of Y on X for Z = 0, 2) the regression of Y on X for Z = 1. The difference in regression weights (raw b weights) between analyses should equal the coefficient value of .812 that we observed above, and would should get a powerful visual display of the slope difference. The following analyses should suggest the same result that we got from the above analysis that included interaction terms. Let's get started with the first analysis:

Analysis 1: Y on X when Z = 0.

To get this anlaysis, ask SPSS to select only those cases for which we have zeros for all dummy-coded variables, since this represents the reference group (cases 41 through 50). Recall the original coded data:

The data we want for the first analysis are in cases 41 through 50 (because that's the group for which Z = 0, it's the reference group). We can ask SPSS to select these cases by the command (or just use window commands, it's much more convenient than always typing in syntax when not necessary - we show the syntax to show the procedural steps, since it's sometimes difficult or inconvenient to show steps through window snapshots):

FILTER OFF.

use 41 thru 50.

EXECUTE.

Next, run the regression of Y on X. Because we've selected only those cases that represent the reference group, the ensuing regression will be Y on X for when Z = 0:

Notice the value of -.524 for X. We will use this value in a moment. It represents the expected change in Y for a one-unit increase in X when Z = 0. Notice that the coefficient is not statistically significant. In other words, this "simple slope" is not statistically significant.

Analysis 2: Y on X when Z = 1.

Let's now run the regression of Y on X for when Z = 1. Again, select cases that represent group 1 for membership on the moderating variable Z. For our data, this is accomplished by:

FILTER OFF.

use 1 thru 10.

EXECUTE.

When we then run the regression, we obtain:

Notice the value of .289 for the coefficient for X. It represents the expected change in Y for a one-unit increase in X when Z = 1. Notice that the coefficient is not statistically significant. Again, just as was the case for Z = 0, this "simple slope" is not statistically significant. But nevertheless, what have we just calculated in these two separate regressions? When we subtract -.524 from .289, we obtain .289 - (-.524) = 0.813. What is this number of 0.813? It represents the difference between slopes of Y on X when Z = 0 vs. Y on X when Z = 1, and is identical (within slight rounding error) to the coefficient we found earlier in the full analysis, marked with the arrow in the following:

Notice as well that the slope difference is statistically significant (p = .022) but neither simple slopes is statistically significant as we saw in the above output. This is just fine, and it simply means that neither slope is really doing much alone, but there still is a statistically significant difference between them. Notice as well what the coefficient is actually telling us. It's telling us that as we move from the reference group (Z = 0) to the group coded 1 (Z = 1), the slope INCREASES by .812. If this is true, then we should be able to visualize this in two separate plots to better understand the effect we've found. Let's produce a scatterplot for Y on X when Z = 0:

FILTER OFF.

use 41 thru 50.

EXECUTE.

Notice the direction of the relationship. It's negative. According to the regression coefficient of .812, when we plot for Y on X for Z = 1, we should see a .812 increase in the slope. Let's obtain the plot for Y on X when Z = 1 to visualize this effect:

FILTER OFF.

use 1 thru 10.

EXECUTE.

Notice that the slope has changed (the red slopes are only approximate, they were inputed manually and not fitted exactly according to the regression equation). By how much? By .812 units. That is, as our coefficient told us, we're seeing an increase in the slope of Y on X for Z = 0 of -.524 to a slope of .289 when Z = 1. This difference is of .812 units. Hence, we've visualized what the regression coefficient X*Z1 was telling us in the original analysis. We could do this for all of the product terms to get a feel for the respective interaction terms. As a guideline, whenever you present simple slopes analyses, it's always a good idea to plot the simple slopes following the original analysis with relevant product terms. The visualization provides a powerful way to gain an appreciation of what's actually going on in your data, and undoubtedly your audience is going to want to see these slopes and plots to get a feel for your findings.

---------------------------------------------------------

How to Display the Effects of Z1, Z2, Z3, Z4

[this section is still under construction]

Group n Mean of Y Mean of X

0 10 5.1000 5.9000

1 10 5.2000 5.4000

2 10 5.9000 5.2000

3 10 4.4000 6.3000

4 10 6.3000 6.6000

MM 5.3800 5.8800

Combined 50 5.3800 5.8800

---------------------------------------------------------

Obtaining Predicted Values in SPSS

Let's return to the table of coefficients:

Let's look at the constant of 8.190 (not the arrowed number, but rather the constant at the top of the table). What does this represent? It is the expected Y for an observation when X = 0. Notice that across the 5 groups of the moderator, an increase in X is associated with an expected decrease in Y of -.524 units. Had we centered X, the interpretation would be that the expected value for Y for someone with an average level of X is 8.190. Will we talk about centering shortly.

Now, let's write out the equation for the first observation in group 1. That individual was in group 1 and had X = 6. What is that observation's predicted value?

Y = 8.190 -.524(6) - 4.549(1) + .812(6)

= 8.190 - 3.144 - 4.549 + 4.872

= 5.369

Thus, the predicted value on Y for someone in group 1 who has a value of 6 on X, is 5.369. We can ask SPSS to produce a whole vector of predicted values for our entire data set (we show the first few predicted values in what follows). Notice that our predicted value of 5.369 that we calculated matches up with the predicted value for observation 1 (within rounding error):

How well do our predicted values match up with the observed values? Let's correlate them:

CORRELATIONS

/VARIABLES=Y PRE_2

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

What does this value of .637 represent? It is the multiple R from our analysis, since multiple R is the bivariate correlation between observed and predicted values. Recall the model summary:

Recall that the purpose of regression, no matter how simple or complex, is to test a model that does its best at reproducing the observed data. If the model reproduces the observed data perfectly, we would expect a multiple R of 1.0. Anything less, and the model isn't doing as good of a job.

Writing Out the Model Equations

In order to gain a better appreciation of what the estimated coefficients in the above regression mean exactly, let's write out a few of the equations. Here's the actual regression equation that we've estimated:

Y = 8.190 + (-.524)(X) + (-4.549)(Z1) + (-5.315)(Z2) + (-9.271)(Z3) + (-5.019)(Z4)

+ (.812)(X*Z1) + (1.106)(X*Z2) + (1.394)(X*Z3) + (.998)(X*Z4)

We can better appreciate what each coefficient is telling us if we consider some scenarios. For instance, suppose a given observation has X = 0, and is in group Z = 0. What would this mean for Z1? Because Z1 is not "activated," we would enter zero for it. Similarly for Z2, Z3 and Z4. And since X*Z1 through X*Z4 are indicators of an interaction with Z, these would all be zero as well. So, we would have:

Y = 8.190 + (-.524)(0) + (-4.549)(0) + (-5.315)(0) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 8.190

So, for an observation with zero on X, and in the reference group (Z = 0), the predicted value on Y is equal to 8.190, which is the value of the intercept. Now, suppose an observation has an X score equal to 10, but is still in group Z = 0. The predicted value would be:

Y = 8.190 + (-.524)(10) + (-4.549)(0) + (-5.315)(0) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 2.95

The predicted value for someone having a score of X = 10, and in the reference group (Z = 0) is equal to 2.95. Notice that the "10" on X brought the score down from the intercept value of 8.190. This is because the effect for X has a negative coefficient (controlling for Z). A unit increase in X equals an expected change in Y of -.524 units, and in the above we multiplied -.524 by 10 because X = 10. It's very reasonable then that our predicted Y dropped quite a bit.

We can keep producing predicted values for various combinations in the equation. Let's do one more. Assume the observation has X = 10 again, but instead of being in the reference group, the observation is in group 3 (Z = 3). Then we would have the following, being sure to "activate" Z = 3 (or "indicate" the variable, which is why we call it an indicator):

Y = 8.190 + (-.524)(10) + (-4.549)(0) + (-5.315)(0) + (-9.271)(1) + (-5.019)(0)

+ (.812)(0) + (1.106)(0) + (1.394)(0) + (.998)(0)

= 8.190 -.524(10) -9.271(1)

= 8.190 -5.24 - 9.271

= -6.321

Let's do one again, this time for an observation actually in our data. Let's take observation 11 in our data. It has a Y value of 8, an X value of 5, is in group Z = 2, and therefore has a product term X*Z2 = 5. It's equation would be the following:

Y = 8.190 + (-.524)(5) + (-4.549)(0) + (-5.315)(1) + (-9.271)(0) + (-5.019)(0)

+ (.812)(0) + (1.106)(5) + (1.394)(0) + (.998)(0)

= 8.190 -2.62 -5.315 + 5.53

= 5.785

Notice that our answer, within slight rounding error, is the predicted value produced by SPSS (count down to observation 11) What was the *actual* Y? It was 8, so we can say that our model didn't reproduce this data point as well as we would have wanted (but still did a decent job, depending on what the standard error of residuals turns out to be):

Why Bother Calculating Predicted Values Manually?

It may seem like a trivial exercise to calculate a few predicted values manually, but it isn't trivial at all. Rather, it's excellent practice at specifying the actual model equations, and ameliorating our understanding of what the coefficients mean in a relatively complex regression with interaction regressors. For instance, if you were asked what the model equation looked like for someone with X = 0 in the reference group (Z = 0), you'd have no trouble writing it out. Similarly, if you were asked what the model equation looked like for someone with X = 10 and in group 4 (Z = 4), you'd again have little difficulty in writing out the equation. Yes, software will compute predicted values for us, but it's always useful to try a few on your own to make sure you're clear on how these predicted values are being produced, especially when the regression model is relatively complex and involves interaction terms, etc. It also helps you familiarize yourself with the model equations.

Mean Centering the Continuous Predictor X

To aid in the interpretation of parameter estimates, it's helpful to mean center the continuous predictor before running the analysis. The mean of variable X is equal to 5.88. To mean center, we command SPSS to subtract this value from each X data point for each individual:

COMPUTE X_cent = X-5.88.

EXECUTE.

The new values for X are then produced in SPSS:

To verify that SPSS did it correctly, consider the first centered value of .12. It was computed by taking 6 (the X value for observation 1 in our data) minus 5.88, which equals .12. Consider what centering the predictor accomplishes. Recall that previously, the intercept in our regression represented the expected value of Y when X = 0. When we center, the intercept will represent the expected value of Y when X still equals 0, but zero now represents the mean of X, and not truly a zero score on X. To understand this better, consider the centering effect for an observation with a score of 5.88. When we center, we get 0 (5.88-5.88). So, when X_cent = 0, it's actually at the mean of X. Let's produce the relevant product terms using the newly centered X variable:

COMPUTE X_cent_Z1 = X_cent * Z1.

EXECUTE.

COMPUTE X_cent_Z2 = X_cent * Z2.

EXECUTE.

COMPUTE X_cent_Z3 = X_cent * Z3.

EXECUTE.

COMPUTE X_cent_Z4 = X_cent * Z4.

EXECUTE.

The new interaction terms will be produced in SPSS when we take the new products:

The product terms using the centered X variable have now been produced, and we can re-run our regression analysis to learn how to interpret the coefficients when X is centered:

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT Y

/METHOD=ENTER X_cent Z1 Z2 Z3 Z4 X_cent_Z1 X_cent_Z2 X_cent_Z3 X_cent_Z4

/SAVE PRED .

We see that the model R of .637 is identical. This is no surprise, since the centering of X didn't really change anything in terms of predictive power of the model, it simply helps us interpret the coefficients a bit better. By centering, we conducted a linear transformation, and so that R remains constant was entirely expected.

Similarly, the statistical significance of the model is identical to when X was not centered. Again, this is expected. Next up, let's take a look at the coefficients. This is where we notice changes:

Here's where centering X will paid dividends in interpretation. Look at the value for the intercept. It is equal to 5.11. What is this value? It is the predicted value for the reference group for when X = 0, which because we've centered X, means that it is the predicted value for the reference group for when X equals its MEAN (and not actually zero as was true when X was not centered). What is the mean of X? It's original mean (before centering) was 5.88. So, the constant 5.11 means that the predicted value for Y for an individual in the reference group with an average amount of X (X = 0, which means X = 5.88 because it's been mean-centered) is 5.110. How do we know this is the predicted value for the reference group? Because if it's for the reference group, then that implies that all Z will equal 0, and we'll have:

Y = 5.110 -.524(X) + .228(Z1) + 1.185(Z2) + (-1.076)(Z3) + .848(Z4) + .812(X*Z1) + 1.106(X*Z2) + 1.394(X*Z3) + .998(X*Z4)

= 5.110 -.524(0) + .228(0) + 1.185(0) + (-1.076)(0) + .848(0) + .812(0) + 1.106(0) + 1.394(0) + .998(0)

= 5.110

Notice that our predicted Y of 5.110 matches that of the intercept term. Because we centered X, when we input X = 0, we're actually evaluating at the mean of X, rather than when X actually equals 0 such as before when it was not centered. The term "-.524(0)" now says, "at the mean of X".

Let's keep X = 0, but evaluate when Z = 1. The value for Z1 is .228, which means that as we go from group 0 to group 1, the expected increase in Y is .228 units. Realize that this represents a contrast between the reference group and the group coded 1. Let's evaluate the equation:

Y = 5.110 -.524(X) + .228(Z1) + 1.185(Z2) + (-1.076)(Z3) + .848(Z4) + .812(X*Z1) + 1.106(X*Z2) + 1.394(X*Z3) + .998(X*Z4)

= 5.110 -.524(0) + .228(1) + 1.185(0) + (-1.076)(0) + .848(0) + .812(0) + 1.106(0) + 1.394(0) + .998(0)

= 5.110 + .228

= 5.338

Hence, we see that being in group 1 compared to group 0 (reference group) increases the predicted Y by .228, resulting in a predicted value of 5.338. To see the value in this kind of prediction, imagine you were to guess the Y score of a person standing behind a closed door. You know nothing about the person, except that you know they have average X, and are in group 1 of the moderator rather than group 0. You could reason as follows: "A good prediction for someone with average X in the reference group would be 5.110. But, if I know that person is in group 1 rather than group 0, I'm going to increase my estimate by .228 units, for a guess of 5.338. That's my best prediction for a person exhibiting these characteristics."

An Analysis Without Partialling Out X

Consider an analysis in which we do not partial out X, but rather simply analyze the effect of the Z-variable:

What is the constant 5.100? It is the mean of the group coded 0 (the reference group). The Z1 coefficient of .100 indicates that as we go from Z = 0 to Z = 1, the increase in the mean is of the magnitude .100. So, the mean for group 1 is 5.200. The coefficient .800 indicates that as we go from group 0 to group 2, the expected mean increase is of the order .800, indicating that the mean for group 2 is 5.100 + .800 = 5.900. For Z = 3, the coefficient of -.700 indicates that the difference between means in group 0 to group 3 is of the magnitude .700, but this time, because the sign of the coefficient is negative, it represents a decrease in the mean rather than an increase. That is, the mean of group 3 is 5.100 - .700 = 4.400. Finally, for Z = 4, we have a coefficient of 1.200. This means that the expected difference in means between group 0 (reference group) and group 4 is of the order 1.200. Since the sign is positive, it indicates that the mean of group 4 is 1.200 units greater than the mean of 5.100. So, the mean of group 4 is 5.100 + 1.200 = 6.300.

Notice that without partialling out the continuous predictor X, the means derived from the coefficient table match up to the actual means for the groups, as we can easily verify by obtaining descriptives on the group means across levels of Z. Notice the means in the following descriptives are 5.100, 5.200, 5.900, 4.400, 6.300 for Z groups 0, 1, 2, 3, 4 respectively. These group means match up perfectly with the means we figured out from the coefficient table:

To
be continued . . .