DSC 510--Quantitative Methods

JMP Logistic Regression

Back to Logistic Regression Notes

Back to DSC 510 Main Page

Setting up the
data table

Running the
model

Interpreting
the results

Whole
model effects

Parameter
Estimates

Effects
Tests

Lack
of Fit Tests

There are two kinds of logistic regression available in JMP: nominal and ordinal, depending on the type of measurement scale of the response (Y) variable. JMP will automatically detect the type of Y you are using and select the right method. Classical logistic regression uses a dichotomous response variable, but JMP and several other statistical packages can use Y variables with several levels.

There are two tools in JMP that can perform logistic regressions: Fit Y by X and Fit Model. The Fit Y by X tool can only use one independent (X) variable.

A sample dataset is available by clicking here.

Setting up
the data table

When you are creating your data
table, bear in mind that JMP determines which response will be "success"
by which comes first in the ASCII collating sequence. (In the ASCII collating
sequence, numbers come before the space, the space comes before the upper
case letters, and upper case letters come before lower case letters.) Thus,
if your Y variable has the values "True" or "False", JMP will treat "False"
as the success since F comes before T. Similarly, if Y has the values
"Yes" or "No", then "No" is the success. If Y has the values 0 or
1, then 0 is the success. A more natural labeling system is to use
"Hit" for success and "Miss" for failure, since H comes before M.

The independent variables should
be of the continuous type.

Running the
model

To run the model,

- Click on Analyze, the either Fit Y by X or Fit Model, depending on how many Xs are in your model.
- Highlight your Y variable in the Select Columns box on the left, then click on the Y button.
- Highlight your X variable or variables in the Select Columns box, then click on the X or Add button, whichever appears in the window.
- Click on the Run Model button, or the OK button, whichever appears in the window. A Fit Nominal/Ordinal Logistic report window should appear.

Model | -LogLikelihood | DF | Chi Square | Prob>ChiSq |

Difference | 0.24157 | 1 | 0.4831 | 0.4870 |

Full | 5.86707 | |||

Reduced | 6.10864 | |||

R² (U) | 0.0395 | |||

Observations | 10 |

The numbers in the -log likelihood
column are based on the probability of obtaining a random sample identical
to the observed sample under various assumptions. These values fall as
the likelihood of obtaining the sample rises.

"Full" refers to the likelihood
using the full model; that is, using the assumption that the probability
that Y will be a "success" changes from observation to observation as the
X variables in your model change.

"Reduced" refers to estimates using
only b_{0} and
none of your X variables. Under this assumption, the probability
that Y will be a "success" is a constant for all observations. Since
it uses the more restrictive assumption, the reduced model will have the
lower probability and hence the larger -log likelihood.

The Difference is simply the difference
in the -log likelihood between the Reduced and the Full models.

Twice the value of the -log likelihood
(-2LL) has a distribution that is approximately a chi-square distribution.
Its degrees of freedom are the number of Xs that are deleted from the Full
model to arrive at the Reduced model--in the example it is 1.

The chi-square test used here,
therefor, tests the null hypothesis that removing all of the X variables
from the model leaves the likelihood of observing the sample unchanged.
In other words, it tests the null hypothesis that full model is no better
at explaining the probability of success than naively using a constant
probability for all observations. If we are able to reject the null
hypothesis, we have evidence that some of the X variables *do* have
an effect on the probability of success. This chi-square test is
always a right-tailed test. The *p* value should be small.
A large *p* value, such as 0.9834, indicates that the likelihood of
the full model is almost the same as that of the reduced model.

The R² (R square) is the logit R² (also called the uncertainty coefficient, U). It is calculated as (the difference in the -log likelihood)/(the reduced model -log likelihood). Similar to the R² in multiple regression, a value of 0 indicates a weak model and that the Xs have no predictive effect, while an R² of 1 indicates a perfect fit.

Term | Estimate | Std Error | ChiSquare | Prob>ChiSq | Odds Ratio |

Intercept | -1.87667 | 1.73804 | 1.17 | 0.2802 | |

X | 0.12260 | 0.18042 | 0.46 | 0.4968 | 2.66659 |

The coefficients of the odds-ratio equation

A positive number in the Estimate column (except for the intercept row) indicates that higher values of the the variable are associated with greater likelihood of "success". Recall from above that in JMP "success" means the value of Y that comes first in the ASCII alphabet. The line below the table ("For log odds of .. .") gives the event that is "success" over the event that is "failure". In the example above, "Hit" is treated as success. Since its coefficient is 0.1226 > 0 we know that as X increases, a "Hit" becomes more likely.

The chi square and its *p* value are like the *t*
test of a coefficient in multiple regression. They test the null
hypothesis that the variable's coefficient, ß_{1},
is equal to zero. If the *p* value is greater than the level
of significance a, we do not have evidence to
show that X has an effect on the probability of success. If the *p*
value is small, say 0.0076, then we do have evidence that X has a real
effect that could be found in other samples.

The odds ratio is not e^{b}^{i}.
It is e^{b}^{i(max
X - min X)}., where max X is the largest value
that the X has in the dataset and min X is the smallest value that the
X has in the dataset. So what JMP is reporting as the odds
ratio really is the ratio of the largest odds ratio over the smallest odds
ratio for that variable. This measure gives a feel for how much impact
this variable has on the odds ratio if we see as much variation in X as
there is in this sample. X could have a large impact on the probability
of success for two reasons:

- each one-unit increase in X has a large impact on the likelihood, or
- there is a lot of variation in X though the effect of each one-unit change in X might be small.

In this example the largest X was 12 and the smallest
X was 4, so the number in odds ratio column for this X variable is e^{0.1226(12
- 4)} = 2.6667. Remember from the dataset in this example that
when X = 4, P = 1/5, and R = 1/4. When X = 12, P =2/5, and R = 2/3.
The number in the odds ratio column is therefor 2/3 ÷ 1/4
= 2.6667.

__Effect Tests__

There are two kinds of effect tests available in JMP:
Wald chi-square tests and likelihood ratio chi-square tests. Usually
they give very similar results. Likelihood ratio tests are available
by clickin on the red triangle for the pop-up menu on the Logistic
Fit bar at the top of the report window, then clicking
on Likelihood ratio tests.
The null hypothesis is the variable has no effect on the probability of
success. The chi-square tests in the Parameter Estimates section
seem to be the same as the Wald chi-square tests.

__Lack of Fit Tests__

These tests test whether including more interaction terms
between your effects would have increased the fit of the model. If
there is only one X variable, lack of fit tests are unnecessary--there
are no possible interactions. Similarly, if you use all possible
interactions, the lack of fit test will not be performed. If the
*p* value for the lack of fit test is small, then the fit could be
improved by including interaction terms or squares of the Xs, etc.

created on April 11, 2001,
by James R. Frederick

copyright 2001 James R. Frederick