Interpretation

Back to the JMP Regression
Page

Back to the MBA 510 Page

The Regression equation

Testing Individual Coefficients

Testing the Overall Equation

The
regression equation

The object of a regression problem is to estimate the
coefficients b_{i}
in the regression equation:

The equation above is the true relationship between Y and the Xs. When we have estimates of the model, the estimated equation is denoted byY = b _{0}+ b_{1}•X_{1}+ b_{2}•X_{2}+ b_{3}•X_{3}+ . . . + e

Y = b_{0} + b_{1}•X_{1}
+ b_{2}•X_{2}
+ b_{3}•X_{3}
+ . . . + e

When we have estimates b_{i},
we can plug values for the X_{i} into the
equation and predict the value of Y that would be associated with those
values of the X_{i}.

Testing
individual coefficients

Once we have this equation, we can test whether the X
variables belong in the equation--that is, whether they really contribute
to explaining the changes in Y. If an X does not affect Y, its coefficient
would be zero. For example, if b_{2}
= 0 then the equation above would be

orY = b _{0}+ b_{1}•X_{1}+ 0•X_{2}+ b_{3}•X_{3}+ e

To test whether a b_{i}
is really equal to zero, we use the *t* test for the estimated coefficient.
Large values of *t*, or small *p* values indicate that there
is a relationship between that X and Y.

Testing
the Overall Relationship

The strength of the overall relationship is measured
by R² (called the coefficient of determination). It measures
the fraction of the total variation in the Y variable that can be explained
by the variation in the X variables. It has a value of 1.00 if all
of the data points in the graph lie exactly on a straigth line--indicating
that all of the variation in Y can be accounted for by the variation in
the X variables. It has a value of zero if there is no relationship
between *any* of the X variables and the Y variable. The two
graphs below show the results of two regressions and their R²s.
(SEE stands for Standard Error of Estimate and in this context it measures
the variation of the data points away from the regression line.)
Note that in the first regression, the data points lie close to the regression
line and R² is large. This reflects the fact that most of the
variation of Y (from 13 through 23) is due to the fact that X changes (from
4 through 10). The second graph shows more variation ("deviation")
away from the regression line. The changes in X predict the same
changes in Y (the predicted values of Y are 13, 16, 19, 22), but now there
is an extra source of variation in Y, so a smaller fraction of the total
variation in Y can be explained by X. Thus, the R² is smaller
in the second graph.

To test the overall regression equation, we use the F
statistic. Usually, if any of the *t* statistics show that a
coefficient is significantly different from zero, then the F will be large.
The null hypothesis is that none of the variables helps to explain the
value of Y.

last updated March 21, 2001, by James R. Frederick

Copyright 2001 James R. Frederick