Linear Regression:
Interpretation

The regression equation
The object of a regression problem is to estimate the coefficients bi in the regression equation:

Y = b0 + b1•X1b2•X2b3•X3 + . . . + e
The equation above is the true relationship between Y and the Xs.  When we have estimates of the model, the estimated equation is denoted by

Y = b0 + b1•X1 +  b2•X2 +  b3•X3 + . . . + e

When we have estimates bi, we can plug values for the Xi into the equation and predict the value of Y that would be associated with those values of the Xi.

Testing individual coefficients
Once we have this equation, we can test whether the X variables belong in the equation--that is, whether they really contribute to explaining the changes in Y.  If an X does not affect Y, its coefficient would be zero.  For example, if b2 = 0 then the equation above would be

Y = b0 + b1•X1 +  0•X2b3•X3 + e
or
Y = b0 + b1•X1b3•X3 + e

To test whether a bi is really equal to zero, we use the t test for the estimated coefficient.  Large values of t, or small p values indicate that there is a relationship between that X and Y.

Testing the Overall Relationship
The strength of the overall relationship is measured by R² (called the coefficient of determination).  It measures the fraction of the total variation in the Y variable that can be explained by the variation in the X variables.  It has a value of 1.00 if all of the data points in the graph lie exactly on a straigth line--indicating that all of the variation in Y can be accounted for by the variation in the X variables.  It has a value of zero if there is no relationship between any of the X variables and the Y variable.  The two graphs below show the results of two regressions and their R²s.  (SEE stands for Standard Error of Estimate and in this context it measures the variation of the data points away from the regression line.)  Note that in the first regression, the data points lie close to the regression line and R² is large.  This reflects the fact that most of the variation of Y (from 13 through 23) is due to the fact that X changes (from 4 through 10).  The second graph shows more variation ("deviation") away from the regression line.  The changes in X predict the same changes in Y (the predicted values of Y are 13, 16, 19, 22), but now there is an extra source of variation in Y, so a smaller fraction of the total variation in Y can be explained by X.  Thus, the R² is smaller in the second graph.

To test the overall regression equation, we use the F statistic.  Usually, if any of the t statistics show that a coefficient is significantly different from zero, then the F will be large.  The null hypothesis is that none of the variables helps to explain the value of Y.

last updated March 21, 2001, by James R. Frederick