The University of North Carolina at Pembroke
DSC 510--Quantitative Methods
JMP  Logistic Regression

There are two kinds of logistic regression available in JMP: nominal and ordinal, depending on the type of measurement scale of the response (Y) variable.  JMP will automatically detect the type of Y you are using and select the right method.  Classical logistic regression uses a dichotomous response variable, but JMP and several other statistical packages can use Y variables with several levels.

There are two tools in JMP that can perform logistic regressions: Fit Y by X and Fit Model. The Fit Y by X tool can only use one independent (X) variable.

A sample dataset is available by clicking here.

Setting up the data table
When you are creating your data table, bear in mind that JMP determines which response will be "success" by which comes first in the ASCII collating sequence. (In the ASCII collating sequence, numbers come before the space, the space comes before the upper case letters, and upper case letters come before lower case letters.) Thus, if your Y variable has the values "True" or "False", JMP will treat "False" as the success since F comes before T.  Similarly, if Y has the values "Yes" or "No", then "No" is the success.  If Y has the values 0 or 1, then 0 is the success.  A more natural labeling system is to use "Hit" for success and "Miss" for failure, since H comes before M.

The independent variables should be of the continuous type.

Running the model
To run the model,

1. Click on Analyze, the either Fit Y by X or Fit Model, depending on how many Xs are in your model.
2. Highlight your Y variable in the Select Columns box on the left, then click on the Y button.
3. Highlight your X variable or variables in the Select Columns box, then click on the X or Add button, whichever appears in the window.
4. Click on the Run Model button, or the OK button, whichever appears in the window.  A Fit Nominal/Ordinal Logistic report window should appear.
Interpreting the results.
Whole Model Test

 Model -LogLikelihood DF Chi Square Prob>ChiSq Difference 0.24157 1 0.4831 0.4870 Full 5.86707 Reduced 6.10864 R² (U) 0.0395 Observations 10

The numbers in the -log likelihood column are based on the probability of obtaining a random sample identical to the observed sample under various assumptions. These values fall as the likelihood of obtaining the sample rises.
"Full" refers to the likelihood using the full model; that is, using the assumption that the probability that Y will be a "success" changes from observation to observation as the X variables in your model change.
"Reduced" refers to estimates using only b0 and none of your X variables.  Under this assumption, the probability that Y will be a "success" is a constant for all observations.  Since it uses the more restrictive assumption, the reduced model will have the lower probability and hence the larger -log likelihood.
The Difference is simply the difference in the -log likelihood between the Reduced and the Full models.

Twice the value of the -log likelihood (-2LL) has a distribution that is approximately a chi-square distribution.  Its degrees of freedom are the number of Xs that are deleted from the Full model to arrive at the Reduced model--in the example it is 1.
The chi-square test used here, therefor, tests the null hypothesis that removing all of the X variables from the model leaves the likelihood of observing the sample unchanged.  In other words, it tests the null hypothesis that full model is no better at explaining the probability of success than naively using a constant probability for all observations.  If we are able to reject the null hypothesis, we have evidence that some of the X variables do have an effect on the probability of success.  This chi-square test is always a right-tailed test.  The p value should be small.  A large p value, such as 0.9834, indicates that the likelihood of the full model is almost the same as that of the reduced model.

The R² (R square) is the logit R² (also called the uncertainty coefficient, U).  It is calculated as (the difference in the -log likelihood)/(the reduced model -log likelihood).  Similar to the R² in multiple regression, a value of 0 indicates a weak model and that the Xs have no predictive effect, while an R² of 1 indicates a perfect fit.

Parameter Estimates

 Term Estimate Std Error ChiSquare Prob>ChiSq Odds Ratio Intercept -1.87667 1.73804 1.17 0.2802 X 0.12260 0.18042 0.46 0.4968 2.66659
For log odds of Hit/Miss

The coefficients of the odds-ratio equation

R = e0 + ß1X1 + ß2X2 + ß3X3)
are found in the  Parameter Estimates section.

A positive number in the Estimate column (except for the intercept row) indicates that higher values of the the variable are associated with greater likelihood of "success".  Recall from above that in JMP "success" means the value of Y that comes first in the ASCII alphabet.  The line below the table ("For log odds of .. .") gives the event that is "success" over the event that is "failure".  In the example above, "Hit" is treated as success.  Since its coefficient is 0.1226 > 0 we know that as X increases, a "Hit" becomes more likely.

The chi square and its p value are like the t test of a coefficient in multiple regression.  They test the null hypothesis that the variable's coefficient, ß1, is equal to zero.  If the p value is greater than the level of significance a, we do not have evidence to show that X has an effect on the probability of success.  If the p value is small, say 0.0076, then we do have evidence that X has a real effect that could be found in other samples.

The odds ratio is not ebi.  It is ebi(max X - min X)., where max X is the largest value that the X has in the dataset and min X is the smallest value that the X has in the dataset.   So what JMP is reporting as the odds ratio really is the ratio of the largest odds ratio over the smallest odds ratio for that variable.  This measure gives a feel for how much impact this variable has on the odds ratio if we see as much variation in X as there is in this sample.  X could have a large impact on the probability of success for two reasons:

1. each one-unit increase in X has a large impact on the likelihood, or
2. there is a lot of variation in X though the effect of each one-unit change in X might be small.
This measure incorporates both effects, where as the coefficient  b itself only measures the first effect.

In this example the largest X was 12 and the smallest X was 4, so the number in odds ratio column for this X variable is e0.1226(12 - 4) = 2.6667.  Remember from the dataset in this example that when X = 4, P = 1/5, and R = 1/4.  When X = 12, P =2/5, and R = 2/3.  The number in the odds ratio column is therefor  2/3 ÷ 1/4 = 2.6667.

Effect Tests
There are two kinds of effect tests available in JMP: Wald chi-square tests and likelihood ratio chi-square tests.  Usually they give very similar results.  Likelihood ratio tests are available by clickin on the red triangle for the pop-up menu on the Logistic Fit bar at the top of the report window, then clicking on Likelihood ratio tests.  The null hypothesis is the variable has no effect on the probability of success.  The chi-square tests in the Parameter Estimates section seem to be the same as the Wald chi-square tests.

Lack of Fit Tests
These tests test whether including more interaction terms between your effects would have increased the fit of the model.  If there is only one X variable, lack of fit tests are unnecessary--there are no possible interactions.  Similarly, if you use all possible interactions, the lack of fit test will not be performed.  If the p value for the lack of fit test is small, then the fit could be improved by including interaction terms or squares of the Xs, etc.

created on April 11, 2001, by James R. Frederick