Survival Analysis

General Purpose of Survival Analysis
Survival analysis is used to study the pattern of survival or failure over time.  The event that defines failure is something that can only happen once to an individual or item.  Researchers often need to describe the survival pattern.  Survival analysis also allows researchers to test whether things affect the survival pattern. For  example,
• a tire manufacturer may need to demonstrate that when it uses process A its tires last longer or are less prone to blow-outs than when it uses process B.
• a life insurance company needs to know when an insured retiree will die.
• a pharmaceutical company wants to know whether patients who use its new insulin product have fewer strokes than patients who use the existing insulin product.  Strokes occur randomly over a period of time after starting the product.
• a human resource manager wants to know whether workers assigned to manager A quit sooner than workers assigned to manager B or manager C.
 Dependent variable Time to an event (death, failure, graduation, etc) Independent variables May be categorical or numerical, depending on the method.  Kaplan-Meier and life-table approaches use categorical independent variables. Cox regression uses continuous variables, but may included dummy variables to represent a categorical independent variable. Null hypothesis The pattern of times to the event is the same for every group or every value of the independent variables. That is, the null hypothesis assumes that the independent variables have no effect on the time to the event. Test statistic Log-rank test (used with Kaplan-Meier) results in an approximately chi-square statistic. Wilcoxon test (used with Kaplan-Meier) results in an approximately chi-square statistic. Rejection regionn Right tail of the chi-square distribution.  (large chi-square value lead to rejection of the null hypothesis.)

Survival analysis MUST be used if some of the data are censored.  Survival analysis usually should be used if the variable of interest is a time to an event.

Survival concepts
Survival function: S(t) (also called the survivorship function) the survival function shows the fraction of the original group who survive at various points in time.  The function starts with 100% at time zero and decreases over time as the events happen to the individuals in the study.

Failure function: F(t) the failure function shows the cumulative fraction of the original group for whom the event has happened by various points in time.  The function starts at zero percent at time zero and increases over time as the events occur.
At any point in time, the sum of the survival fraction and the failure fraction must equal 1.00.   That is S(t) + F(t) = 1 at all times t.

Density function: f(t)  (also called the probability density function) the density function gives the fraction of the original group for whom the event occurs during the time interval at t adjusted for the width of the time interval.  If d = F(t2) - F(t1) is the number of events (deaths) that occur between t1 and t2, then f(t) is d/(t2 - t1).  If we think of continuous time, we can think of the time interval between t1 and t2 as being infinitesimally small.  In this case f(t) is the derivative of F(t).  f(t) = F'(t)

Hazard function: h(t) or lambda(t)  the hazard function gives the fraction of the individuals who survived to time interval t for whom the event will occur in time interrval t.  Thinking of continuous time, the hazard function is f(t)/S(t)

The graph below illustrates these functions for deaths in an unusual population where people die randomly but the deaths are distributed evenly over a 100-year span.  The density function, f(t), shows that has just as great a chance of dying at age 98 as at age 4.  The hazard function, h(t), shows that as the person ages, the chance that the next year will be the person's last increases until it reaches 100% for someone who reaches the age of 99 years. (When I say this, I am thinking of time in descrete years.  Of course if we think in terms of descrete months the hazard only reaches 100% when the person reaches the age of 1199 months--one month short of 1200 months.  In continuous time, the hazard only reaches 100% at age 100.)
S(t) = 1.00 - 0.01t     F(t) = 1.00 - S(t) = 0.01t    f(t) =  F'(t) = 0.01    h(t) = 0.01/(1.00 - 0.01t) = 1/(100 - t)

Survival functions with Uniform Density  f(t) = 0.01

The exponential density function shown below, results in a constant hazard function.  This is the same pattern as shown by the radioactive decay of uranium.  In this graph, lambda is the constant hazard rate and is shown to be about 10%.  That means that each year, 10% of the survivors are expected to fail.  The half life is the time until the survival rate is 50%--which is where S(t) and F(t) cross.
S(t) = e-(lambda)t     F(t) = 1.00 - e-(lambda)t      f(t) = F'(t) = (lambda) e-(lambda)t      h(t) = f(t)/S(t) = (lambda)

Survival functions with Exponential Density f(t) = (lambda) e-(lambda)t

Censoring
One of the key features of survival analysis is the way it handles censored data.  The times to failure are said to be censored if we cannot observe the starting point (left censored) or the ending point (right censored).  The methods discussed here assume that some of the data have been right censored, that is that for some individuals or items, the event had not yet happened before the analysis was performed.  Data will be censored if failure does not occur before the end of the study period, or if for some reason we are not able to get a final reading on the individual or item.  (Patients may move away and be "lost to follow-up".  Test samples may be destroyed by laboratory accidents.)

Why "hard-numbers" approach is biased.
Physicians often want to see the survival rate for the "hard numbers" usually after a year or five years.  By this they mean the proportion of people who are known to have died out of all patients whose status is known after five years.  Since survivors can get lost, but dead people are known to be dead, the "hard-numbers" approach has a bias toward counting deaths and missing lost survivors.  So, the "hard-numbers" estimates will be biased toward lower survival rates.  Unbiased estimates can only be obtained by dealing with the censored data.

There are three general approaches to survival analysis:
1. Product limit (Kaplan-Meier) estimates.  Each time an individual or item fails, the S(t) is re-calculated.  The new S(t) is the old S(t-m) times the fraction of those who were being followed at time t-m who survived to time t. People who were lost in the meantime are removed from both the numerator and the demoninator of the fraction.  For example, if the old rate at t = 26 months was 0.60 and at time t = 29 one case out of 18 failed, then the new survival rate for time t = 29 would be 0.60 x (17/18) = 0.5667.   When data are censored, the survival rate does not change, but its variance will change.  The survival function is plotted as a stair-step graph with each step ending when a failure occured.  JMP IN performs a product-limit analysis.
2. Life table estimates.  These are similar to product-limit estimates, except that instead of recalculating when a failure occurs, life-table estimates recalculate after even intervals of time, such as a month or a year.  There may have been several failures and several cases lost to follow up during that interval.  The survival function is plotted by connecting each period's survival rate to the next with a diagonal line segment.  This reflects the fact that we do not know when the failures occured within the period and that we assume the failures occured at evenly spaced times within the interval.  Before the development of personal computers, life-table methods were used to calculate survival functions from large data sets.  These days, the speed and convenience of personal computers allows product-limit analyses of large data sets.
3. Parametric.  There are several approaches that try to fit a specific equation to the data by finding the parameters for the equation that optimize the equation in some way.  One popular example is Cox regression.  Cox regression assumes that the hazard function for one group of cases is always proportional to the hazard function of a different group.