The University of North Carolina at Pembroke
Multiple Regression in JMP IN

Check assumptions
To test the normality of the residuals
To test for autocorrelation of the residuals
To test the linearity of the model
Homoskedasticity

Multicollinearity

Estimation of the model
To estimate a predetermined model
To select a model for estimation

Check for outliers, leverage points, validity
Outliers
Leverage points
Influential observations
To validate the model

Selecting a model
In this context, a model is a way of representing the relationships in the data.  It is a mathematical equation having a certain set of variables and certain functional forms of those variables (X², Ln X, etc.)
Often a researcher starts with a list of regressors (Xs) that might be relevant to changes in the regressand (Y) and wants to see which of the regressors actually are related to the regressand.  Let the number of potential regressors be h.  There are four common approaches to selecting variables for inclusion in the model:
• All possible subsets
• Backward elimination
These methods consider various subsets of the h potential regressors.  There are several criteria for choosing the best model from among those tried.  Some of them are:
• F
• Cp
Each of these criteria assumes that the regressand, Y, remains exactly the same.  Using the square root of Y instead of Y will reduce the variation in the regressand (assuming that Y > 1/4) regardless of whether the square-root form is really a better model.
The R² criterion should only be used to compare models that have the same number of regressors.  Higher values of R² are better.  A perfect fit would give an R² of 1.0.  The lowest possible R² is 0
The adjusted R² can be used to compare models that have different numbers of regressors.  Again, higher adjusted R² values are better and the highest possible adjusted R² is 1.0.  Unlike the R², the adjusted R² can be negative.
The F refers to the F statistic from the Analysis of Variance section of the Fit Least Squares report window.  There is a functional relationship between F and R² that depends on the sample size and the number of coefficients in the model.

All-possible-subsets regression does exactly what the name implies: it regresses Y on all  possible subsets of the X variables.  If there are h potential regressors, there will be 2h regressions to perform.  This is not much of a problem for modern computers.  However, it will give a lot of output to sort through.

The other three methods use short cuts to avoid having to regress all possible subsets.  Usually they work well, but they can miss the best model.  When deciding whether to add a regressor to the model or delete a regressor from the model, these methods use criteria to judge the contribution of the regressor to the model as a whole.  Some criteria are:

• the t ratio for the regressor's slope coefficient.  Large absolute values of t indicate that the regressor should be included in the model.
• the F ratio for a group of regressors, or for a single regressor in which case it is equivalent to a t.  Large F values indicate that the group of regressors should be included in the model.  F is especially useful when there is a group of dummy variables to be included.
Backward elimination starts by regressing Y on all of the h regressors and eliminates those regressors whose t statistics have high p values.  The model is regressed again with the remaining variables. The process is repeated until there are no more variables to be eliminated--all remaining variables have significant coefficients.

Forward addition starts with only the intercept and then performs h regression with the intercept and each regressor one at a time.  The regressor that contributed the most to the explanation of Y is added to the model.   The next step is to perform h-1 regressions with the the intercept, the first variable, and each of the h-1 remaining regressors to find the second most important variable to add to the model.  The process is repeated until none of the remaining regressors has a significant contribution to the model, given the regressors that are already in the model.

Stepwise regression is similar to forward addition except that after each variable has been added to the model the t statistics of the regressors in the model are examined to see whether any of them should now be dropped from the model.  The criterion for adding a variable to the model should be more stringent than the criterion for keeping a variable in the model.  That is, tin > tout, or Fin > Fout, or ain < aout.  Otherwise, the process could get stuck in a loop of adding a variable and then removing it, only to add it in again.

Stepwise regression is performed by clicking on Analyze on the Top Menu, then clicking on Fit Model.  After selecting a Y variable (highlight the variable on the left and click on the Y button) and a group of X variables (highlight the Xs on the left and click on the Add button), click on the triangle in the upper-right hand corner of the window (in the box labeled "Personality:").  From the options that appear, select Stepwise.  The Fit Stepwise window will appear.  Click on the Go button, then on the Make Model button.  (If you want to see the individual steps in the regressor selection process, you can click on the Step button repeatedly instead of using the Go button.)  A new Stepped Model window will appear that is similar to the Fit Model window.  Click on Run Model in this window.  Your regression results will appear after about a second.

Test normality of the residuals

How to do it:
Save the residuals
From the Fit Least Squares screen, click on the red triangle beside Response,
then click on Save Columns,
then Residuals.
From the Top Menu Bar, click on Analyze, then Distribution, then Fit Distribution.
The Fit Distribution dialog box will appear.  Choose the column of residuals as Y and click OK.
The Distribution report window will appear.  Click on the second pop-up menu (red triangle), then click on Fit Distribution, then highlight Normal.  JMP will add a section to the Distribution window which provides information about the Fitted Normal distribution.  Click on the pop-up menu on the Fittted Normal bar.  The W test statistic for the Shapiro-Wilk test and its p value will appear in the Fitted Normal section.  (If the sample size is greater than 2000, the Kolmogoro-Smirnov-Lilliefors statistic will appear instead of the Shapiro-Wilk W.) The null hypothesis of this test is that the data do come from a normal distribution.  Small p values indicate that the hypotisis of normality of the residuals should be rejected.

What happens if the residuals are not normal:
The estimates of the regression coefficients are still unbiased and they have small variances.  However, we can no longer use the Z, t, and F distributions to test the coefficients or the model as a whole.

What to do about nonnormal residuals:
Sometimes a transformation of the Y values will create normally distributed residuals.
You might try replacing Y with Y raised to some power (positive or negative, less than one or greater than one), eY, or LnY
Transforming Y may create other problems, such as nonlinearity or heteroskedasticity.

To check for autocorrelation of the residuals
Autocorrelation of the residuals occurs when there the residuals are correlated with lagged values of themselves; that is, when et tends to be correlated with et-s.  The Durbin-Watson statistic tests for correlations between et and et-1, which is called serial correlation.

The Durbin-Watson statistic will be near 2.0 if there is no autocorrelation.
If the statistic is near 0.0, there is evidence of positive autocorrelation (high residuals tend to be followed by high residuals, and negative residuals tend to be followed by negative residuals).
On the other hand, if the statistic is near 4, there is evidence of negative autocorrelation (positive residuals tend to be followed by negative residuals, and vice versa).
Note that the Durbin-Watson statistic should not be used when one of the regressors is a lagged value of the regressand.

How to do it:
In the Fit Least Squares report window, click the pop-up menu (red triangle) beside the Response, then click on Row Diagnostics, then click Durbin Watson Test to place a check beside it.  The Durbin-Watson statistic will appear near the bottom of the Fit Least Squares window.  To get the p value for the DW statistic, click on the pop-up menu on the Durbin-Watson bar, and then click Significance P Value.  The p value is for the DW statistic under the null hypothesis that there is no autocorrelation among the residuals.

This p value is the probability of finding a smaller DW statistic in a new random sample if there is no autocorrelation.  When DW = 0.00, 2.00, and 4.00, the p values will be 0.0, 0.5, and 1.0, respectively.  If we have no prior reason to believe that the autocorrelation should be positive or negative, then we should use a two-tailed rejection region here.  Using a = .05 we would reject the null hypothesis of no autocorrelation whenever the p value < 0.025 or p value > 0.975.  Using a = .05 with a left-tailed test for postitive autocorrelation, we would reject the null hypothesis whenever p value < 0.05.  Using a = .05 with a right-tailed test for negative autocorrelation, we would reject the null hypothesis whenever p value > 0.95.

What happens if there is autocorrelation among the residuals:
Essentially, it is as if there were fewer observations in the sample than there really are.  Because the data are not independent of each other, a sample with 200 autocorrelated observations does not have as much information in it as a sample of 200 uncorrelated observations.
The regression coefficients are unbiased, but the estimates of the variances will be biased.  When there is positive autocorrelation, s² will be too small (systematically smaller than the true s²), so it will be too easy to reject the null hypothesis.  A researcher who thinks he is using a = 5% could actually be using a = 15%.

What to do about autocorrelated residuals:
One approach is to try replacing Y with DY and each X with DX.  In this form b0 should be close to zero.
Another approach is two-stage least squares (2SLS).

Check the linearity of the model
How to check linearity:
In the Fit Least Squares report window, look at the graph labeled Residual by Predicted Plot.
If the points in the graph trace out a U shaped pattern or an inverted U, there are nonlinear effects that you have not incorporated into your model.

Try different transformations of Y or the Xs.

Interpreting the JMP output
The object of multiple regression is to estimate an equation of the form:

Y = b0 + b1•X1b2•X2b3•X3 + . . . + e

The estimates of the bi (denoted by bi) are given in the Parameter Estimates section of the Fit Least Squares report window.  "Term" indicates the coefficient with "Intercept" indicating b0.  "Estimate" gives the value of the bi .  "Std Error" gives the standard error (standard deviation) of the estimate, bi. "t Ratio" gives the test statistic t for a test of the null hypothesis that bi = 0.  If this hypothesis is true, then Xi has no effect on Y and can be deleted from the regression model.  "Prob>|t|" gives the p value for a two-tailed test of the null hypothesis that bi = 0.  Small values in this column indicate that the corresponding X variable really does have an effect on Y.

Whereas the t statistics in the Parameter Estimates section test the importance of individual X variables, the Analysis of Variance section of the Fit Least Squares report window provides a test of the overall equation.  The F statistic reported in the last column, and its p value below it, test the null hypothesis that allbi, except for the intercept b0, are equal to 0.  To reject the null hypothesis, the F ratio must be significantly larger than 1.0.  Sometimes multicollinearity among the Xs can result in small t ratios, but the F will be large.  This would indicate that some combination of the Xs can explain Y, but it is impossible to tell exactly which Xs are responsible for the correlation.

The fundamental variance in the equation is estimated by the Mean Squared Error (MSE), which is found under the "Mean Square" column in the "Error" row in the Analysis of Variance section of the Fit Least Squares report window.  The MSE estimates the variance of the e.

The "RSquare" and "RSquare Adj" are the coefficient of determination and the adjusted coefficient of determination.  They estimate the fraction in the overall variance of Y (as measured by the C.(orrected) Total Sum of Squares in the Analysis of Variance section) that can be explained by the changes in the Xs (as measured by the Model Sum of Squares in the Analysis of Variance section).  A low R² indicates that there are some important variables that contribute to the variation in Y besides the ones that you included in your model.  (It may be impossible to measure these variables.)

Outliers: an outlier is a point that has an unusually large error, e.  (e = Y - Y^, where Y^ is the predicted value of Y that corresponds to the same observed value of Y.  The ^ would ordinarily be written over the Y and it would be called "Y-hat".  Y^ = b0 + b1•X1 + b2•X2 + b3•X3 and Y = b0 + b1•X1 + b2•X2 + b3•X3 + e.)  A good graph for detecting outliers is the Residual by Predicted Plot.  (If this plot does not appear in your Fit Least Squares report window, click on the red pop-up menu by the Response bar, click on Row Diagnostics, then click on Plot Residual by Predicted.)  Points that lie significantly above or below the rest of the data are outliers.

Leverage points: a leverage point is an observation that has an unusual value for one of its Xs.  A good graph for detecting leverage points is the Leverage Plot near the top of the Fit Least Squares report window.  There is a Leverage Plot for each regressor, X.  The slope of the red line through these plots is the slope estimate, b.  Points which are far to one side of the the average X value will be able to exert strong influence over b.  If there is a single point at an unusual value of X, it will exert more influence over b than would, say, one of three points at that value of X.

Influential observations: an influential observation is any observation that if omitted from the regression would have a large effect on the parameter estimates of the model.  Often, influential variables are outliers or leverage points.  Two measures of influential observations are often used:

• Cook's distance.  Some statisticians suggest that an observation is influential if its Cook's distance is greater than the 50th percentile on an F distribution value which has k and n-k degrees of freedom.  In JMP, this number can be found by opening a new data table, adding one row, right clicking on the top of the first column and selecting formula.  In the formula editor, enter F Quantile(0.5, k, n-k) where k and k would be replaced by the appropriate numbers.  The first number for the function is the probability as a decimal not as a percentage.  In Excel, use the function =FINV(.5,k,n-k).  When the n - k is thirty or more, this threshold F value will always be less than 1.  So, if n - k is greater than 30 and an observation's Cook's distance is less than 1.0, consider that observation an influential observation.  JMP does not show Cook's distance in the report window, but you can save the Cook's distances for all observations in the data table.  Click on the pop-up menu at the top of the the Fit Least Squares report window, and click on Save Columns, and then check Cook's D Influence.
• DFFITS  Some statisticians suggest that an observation is influential if its DFFITS is greater than 2*SQRT((k)/n) where k is the number of regressors in the model + 1 (that is the number of b coefficients) and n is the sample size.

To validate the model:
There are several ways to validate the model.  Essentially, they involve repeating the regression on data that were not used in calculating the original estimates of the model.  Some methods are
•  use a new data set,
•  use a hold-out data set (or validation data set) from the original sample; that is, data set aside from the original data set and used to validate the model, rather than to use in the original estimation of the model.
•  split halves.  After estimating the model, the data set may be split and the model run on each half.  If the results from each half are different than the estimates from the other half or from the whole model, then there is evidence that the original results are not valid.
•  calculate the PRESS statistic (PRedicted Error Sum of Squares).  The PRESS cannot be less than the SSE (Sum of Squared Errors, found in the Analysis of Variance section of the Fit Least Squares window), but it should be close to the SSE.  If the PRESS is several times larger than the SSE, there is a validity problem.  The SSE is the sum of squared residuals.  The PRESS is like the SSE except that the residual for each observation is calculated from a new regression that omits the observation from the estimation.  This is similar to turning each observation into a hold-out sample of size 1.

How to calculate the PRESS:
In the Fit Least Squares report window, click the pop-up menu (red triangle) beside the Response, then click on Row Diagnostics, then click Press to place a check beside it.  The PRESS statistic will be reported immediately below the graph labeled Residual by Predicted Plot.

What does lack of validity do?
Lack of validity can result from outliers, influential observations, or leverage points.  These are occasional data points that have significant on the results.  Lack of validity means that the results you found may be due to these unusual data points and may not be true in other samples or in the population as a whole.

What to do about lack of validity?
If the outliers, influential observations, or leverage points can be identified, you might be able to detect something unusual about these observations.  This could lead you to include a new variable to accommodate these data points.  Sometimes they will reveal a new variable that is relevant to your model.  Sometimes they may simply reveal that one of the researchers observed data differently than the others did, or on Friday afternoons your observations were not as good as they were on other days.

Proof that the variation in the square root of Y is less than the variation in Y when Y > 1/4.
The key point is whether the slope of the square-root relationship is greater than 1 or less than 1.  If the slope is greater than 1, the transformation increases the spread among the Y values, but if the slope is less than 1, it decreases the spread.  So we need to find the values of Y for which the slope of Y0.5 is less than 1.  d(Y0.5)/dY = 0.5Y-0.5.  We set this equal to 1 and solve for Y.  0.5Y-0.5 = 1     Y0.5 = 0.5/1 = 0.5         Y = 0.5² = 0.25
See?  There really is a reason to study calculus.  Now go impress your spouse with it.

created by James R. Frederick, March 28, 2001