On the use of multi-objective evolutionary algorithms for survival analysis
Introduction
Survival analysis involves the estimation of the distribution of the time it takes for an event to occur to an object depending on its features (Kleinbaum, 1996). In a medical domain, objects often correspond to patients and their features, which are also known as explanatory variables, predictors and covariates, could be demographic information and/or physiological information. Events may correspond to the recurrence of a disease or the death of a patient. Hence, the distribution of the time to a specific event for an object is also referred to as lifetime/failure time distribution. To estimate the lifetime distribution, or conversely the probability of survival, has many benefits. It allows clinicians to devise a suitable treatment regime and counsel patients about their prognosis. Hence, it helps patients to plan their lives and provide future care for their dependents.
Survival analysis is also widely used in the social and economic sciences, as well as in engineering. Here the objects could correspond to customers, machines/systems and the event of interest may be that the customer ‘churns’ or the failure of the machine. Survival analysis is therefore also referred to as reliability and failure time analysis (Afifi et al., 2003).
The distribution of the time to a specific event dependent upon the features of an object can be represented by four closely related functions, which are listed as follows:
- •
density function (p.d.f.);
- •
cumulative distribution function (c.d.f.);
- •
survival function ;
- •
hazard function .
These functions are related to each other as shown in Eqs. (1), (4)Allison, 1997, Collet, 1994:
The survival and hazard function are the most popular ways of describing the lifetime distribution dependent upon of the features x of an object. The hazard function is also often referred to as force of mortality, instantaneous death rate, or failure rate in a medical domain (Afifi et al., 2003). It is important to note that, although it may be helpful to think of the hazard as an instantaneous probability, it is not a probability as it can take on values greater than one (Allison, 1997).
A popular way of estimating the survival function is illustrated using a dataset taken from Freireich et al. (1963). The dataset contains the survival times of 42 leukaemia patients with one dichotomous feature that indicates whether or not the patient received a particular treatment. Table 1 contains the data separated according to the treatment feature.
The time is measured in weeks, and the maximum time horizon of the study (the follow up time) is 35 weeks. Patients shown without plus signs died during the follow up time, whereas patients shown with plus sign were censored. In essence, censoring occurs if one knows the status of an object for a particular period of time, but not for the complete follow up time.
Generally there are three different forms of censoring, which are listed as follows (Kleinbaum, 1996):
- (1)
the event does not happen to the object before the maximum time horizon of the study;
- (2)
the object is lost to follow-up during the study;
- (3)
the object is withdrawn from the study because a different event made it impossible to follow it up any further.
The first type of censoring indicates that the event did not happen to the object during the complete follow up time (e.g. the patient did not die during the study). Note that the patient can usually enter the study at any time during the follow up time.
The second type of censoring occurs when one does not know the status of the object after a particular point in time. For example, the patient may have failed to attend an appointment in the clinic (Griffin, 1998).
The third type of censoring occurs because an event that is not relevant to the study happened to the object and made it impossible to follow the object up any further (e.g. the event of interest may be ‘cancer related death’ but the patient died from a car accident).
All three types of censoring are often referred to as right censoring, which is the most common form of censoring (Lawless, 1982). Another form of censoring is known as left censoring. It occurs if the status of an object is unknown at the left side of the follow-up period (e.g. the diagnosis of a disease does not necessarily mean that one knows when the disease started). If the population is both right and left censored one speaks of interval censoring.
One non-parametric approach for estimating the survival function is the Kaplan–Meier method. Its computation is summarised in Eq. (5):Here, corresponds to the number of events (e.g. deaths) at time where one or more events occurred and corresponds to the number of objects (e.g. patients) that are still observed at time . In the medical domain, these patients are also called the risk set. Given the structure of Eq. (5), it is not surprising that the Kaplan–Meier method is often referred to as the Product-limit estimator. The survival function values for the leukaemia data for patients with treatment are summarised in Table 2 and for patients without treatment in Table 3.
It should be noted that Kaplan–Meier estimates could not be used directly to approximate the lifetime distribution depending upon the explanatory variables. Instead, groups containing patients with particular feature values have to be determined beforehand. Hence, the estimated survival curves are only reliable if the number of samples within each group is large and the amount of censoring is low.
The corresponding survival curves are shown in Fig. 1. Here, the solid line corresponds to the survival curve for patients who were treated, and the dashed line corresponds to the survival curve for patients without treatment.
It can clearly be seen that the survival function estimates of patients without treatment is worse than of treated patients.
Another method for estimating lifetime distributions is the proportional hazards model. It is also known as the Cox model (Cox, 1972). The hazard function of an individual is estimated using Eq. (6):Here, is the baseline hazard and is the relative hazard. The baseline hazard is the hazard for individuals with . The relative hazard is a factor that makes the hazard of individual proportional to the baseline hazard. Proportionality means that none of the survival curves of the population cross (Collet, 1994). It also means that the logarithm of the estimated hazard functions has a constant distance (Marubini and Valsecchi, 1995). In other words, they should be strictly parallel (Allison, 1997). Of course, this assumption is not very realistic as one group of patients might have higher hazard values at the beginning of the study but lower hazard values later. If the converse would occur for another group of patient, the survival curves of both groups would cross.
To model the dependency of the hazard on the feature values of an object one usually replaces the relative hazard with a function that depends on, for example, a linear combination of the features. The exponential function is used commonly to ensure that the relative hazard remains positive. This leads to Eq. (7):The linear combination of the features is also referred to as the risk score or prognostic index. Unfortunately, the coefficients cannot be estimated using the ordinary maximum likelihood method, because the baseline hazard is not defined parametrically. To estimate the coefficients , Cox put forward the partial likelihood method. This method allows the estimation of the coefficients without specifying the baseline hazard . The computation of the partial likelihood is summarised in Eq. (8):Here, if and if , where t corresponds to the time of the event or the time of censoring. This is a convenient way to define the risk set in the denominator at the time of the ith event. The exponent indicates whether the object/patient was censored () or whether the event occurred (). The partial likelihood can be maximised using, for example, a version of the Newton–Raphson algorithm. It has to be noted that Eq. (8) is only valid for data without ties in which no two events occur at the same time. The computation of the exact partial likelihood for tied data can be a daunting task (see for example Allison, 1997). Methods for the approximation of the partial likelihood for tied data are discussed in Allison, 1997, Collet, 1994 and Therneau and Grambsch (2000). Unfortunately, the standard Cox model (Eq. (7)) has several shortcomings, which are summarised as follows:
- •
It assumes proportional hazards;
- •
The baseline hazard has to be determined in order to obtain values of the hazard/survival function for an individual;
- •
The linear combination of the features cannot be used to model interaction effects;
- •
The estimation of the partial likelihood is computationally very expensive when the data contain ties. This is very likely as time is often measured in a discrete domain;
- •
It cannot be used (in its standard form) to model data containing time-dependent features.
These shortcomings can be alleviated using methods that are, for example, discussed in Marubini and Valsecchi (1995) and Therneau and Grambsch (2000). However, it has to be emphasised that the correct use of these methods is quite difficult, as they require extensive statistical knowledge. It is therefore not surprising that researchers are striving to develop more practical approaches, which make fewer assumptions and do not presume extensive statistical knowledge.
This paper investigates the use of a multi-objective evolutionary algorithm (MOEA) for survival analysis. A similar MOEA has already successfully been applied to several classification problems (Setzkorn and Paton, 2005). Although, there are many successful applications of evolutionary algorithms to regression and classification problems, we believe that this is the first study that uses a MOEA for survival analysis. This serves to demonstrate the versatility of evolutionary approaches for model extraction from data.
Section snippets
Existing work
Many proposed methods to overcome the shortcomings of the Cox model use the fact that the hazard corresponds to a conditional probability in the discrete time domain as shown in Eq. (9)Lawless, 1982, Willett, 1993:This is in contrast to the continuous time domain in which the hazard corresponds to a rate that can have values that are greater than one. In the discrete case, the survival probabilities can be computed according to Eq. (10) for each time interval :
Using a multi-objective evolutionary algorithm for survival analysis
This section describes the implemented MOEA. It begins with a brief summary of the algorithm and then describes its particular components. Fig. 2 depicts the structure of the algorithm. It is similar to other evolutionary algorithms (e.g. Michalewicz and Fogel, 2005). In broad terms, the algorithm proceeds as follows. Candidate solutions/individuals (i.e. a population of RBFNs) are initialized randomly. After this, the variation operators are applied to some of the RBFNs to recombine and/or
Results and discussions
The implemented MOEA is evaluated on several benchmark datasets. In addition, it is evaluated on a ‘real-world’ medical dataset. The benchmark datasets comprise the leukaemia dataset (Freireich et al., 1963)(see also Section 1) and four artificial datasets.
The datasets were randomly split into a training dataset and test dataset for each experiment. The training datasets contained two-thirds and the test dataset one-third of the original dataset. Before the MOEA was run a holdout dataset was
Conclusions and further work
A multi-objective evolutionary algorithm for the extraction of models for survival analysis has been proposed and evaluated. The evaluation of the MOEA on several benchmark datasets and one medical problem has shown that the approach is capable of producing accurate and valid models. The evaluation on the artificial dataset also emphasised that the approach can cope with interaction effects and noisy non-proportional hazard distributions. This is in contrast to, for example, the Cox model. In
Acknowledgements
We would like to express our gratitude to Dr. Jessica Grabham, Dr. Tony Fisher and Dr. Peter McBurney for the suggestions they made to improve the paper.
References (55)
Current management of uveal melanoma
Eur. J. Cancer Suppl.
(2005)- et al.
A novel neural network-based survival analysis model
Neural Networks
(2003) - et al.
The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukaemia
Blood
(1963) - et al.
Knowledge visualization techniques for machine learning
Intell. Data Anal.
(1998) - et al.
A bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer
Artif. Intell. Med.
(2003) - et al.
On the use of multi-objective evolutionary algorithms for the induction of fuzzy classification rule systems
BioSystems
(2005) - et al.
Computer-Aided Multivariate Analysis
(2003) Survival Analysis using SAS: A Practical Guide
(1997)- et al.
A time-dependent discrimination index for survival data
Stat. Med.
(2005) - et al.
Neural network survival analysis for personal loan data
J. Oper. Res. Soc.
(2005)