Elsevier

Biosystems

Volume 87, Issue 1, January 2007, Pages 31-48
Biosystems

On the use of multi-objective evolutionary algorithms for survival analysis

https://doi.org/10.1016/j.biosystems.2006.03.002Get rights and content

Abstract

This paper proposes and evaluates a multi-objective evolutionary algorithm for survival analysis. One aim of survival analysis is the extraction of models from data that approximate lifetime/failure time distributions. These models can be used to estimate the time that an event takes to happen to an object. To use of multi-objective evolutionary algorithms for survival analysis has several advantages. They can cope with feature interactions, noisy data, and are capable of optimising several objectives. This is important, as model extraction is a multi-objective problem. It has at least two objectives, which are the extraction of accurate and simple models. Accurate models are required to achieve good predictions. Simple models are important to prevent overfitting, improve the transparency of the models, and to save computational resources. Although there is a plethora of evolutionary approaches to extract models for classification and regression, the presented approach is one of the first applied to survival analysis. The approach is evaluated on several artificial datasets and one medical dataset. It is shown that the approach is capable of producing accurate models, even for problems that violate some of the assumptions made by classical approaches.

Introduction

Survival analysis involves the estimation of the distribution of the time it takes for an event to occur to an object depending on its features (Kleinbaum, 1996). In a medical domain, objects often correspond to patients and their features, which are also known as explanatory variables, predictors and covariates, could be demographic information and/or physiological information. Events may correspond to the recurrence of a disease or the death of a patient. Hence, the distribution of the time to a specific event for an object is also referred to as lifetime/failure time distribution. To estimate the lifetime distribution, or conversely the probability of survival, has many benefits. It allows clinicians to devise a suitable treatment regime and counsel patients about their prognosis. Hence, it helps patients to plan their lives and provide future care for their dependents.

Survival analysis is also widely used in the social and economic sciences, as well as in engineering. Here the objects could correspond to customers, machines/systems and the event of interest may be that the customer ‘churns’ or the failure of the machine. Survival analysis is therefore also referred to as reliability and failure time analysis (Afifi et al., 2003).

The distribution of the time to a specific event dependent upon the features of an object can be represented by four closely related functions, which are listed as follows:

  • density function f(t,x) (p.d.f.);

  • cumulative distribution function F(t,x) (c.d.f.);

  • survival function S(t,x);

  • hazard function h(t,x).

These functions are related to each other as shown in Eqs. (1), (4)Allison, 1997, Collet, 1994:f(t,x)=h(t,x)e0th(u)du=dF(t)dt=dS(t)dt=limdt0P(tT<t+dt,x)dt

F(t,x)=0tf(u)du=P(T<t,x)=1S(t,x)

S(t,x)=e0th(u)du=1F(t,x)=P(Tt,x)

h(t,x)=ddtlog(S(t))=f(t,x)S(t,x)=limΔt0P(tT<t+dt|Tt,x)ΔtThe survival and hazard function are the most popular ways of describing the lifetime distribution dependent upon of the features x of an object. The hazard function is also often referred to as force of mortality, instantaneous death rate, or failure rate in a medical domain (Afifi et al., 2003). It is important to note that, although it may be helpful to think of the hazard as an instantaneous probability, it is not a probability as it can take on values greater than one (Allison, 1997).

A popular way of estimating the survival function is illustrated using a dataset taken from Freireich et al. (1963). The dataset contains the survival times of 42 leukaemia patients with one dichotomous feature that indicates whether or not the patient received a particular treatment. Table 1 contains the data separated according to the treatment feature.

The time is measured in weeks, and the maximum time horizon of the study (the follow up time) is 35 weeks. Patients shown without plus signs died during the follow up time, whereas patients shown with plus sign were censored. In essence, censoring occurs if one knows the status of an object for a particular period of time, but not for the complete follow up time.

Generally there are three different forms of censoring, which are listed as follows (Kleinbaum, 1996):

  • (1)

    the event does not happen to the object before the maximum time horizon of the study;

  • (2)

    the object is lost to follow-up during the study;

  • (3)

    the object is withdrawn from the study because a different event made it impossible to follow it up any further.

The first type of censoring indicates that the event did not happen to the object during the complete follow up time (e.g. the patient did not die during the study). Note that the patient can usually enter the study at any time during the follow up time.

The second type of censoring occurs when one does not know the status of the object after a particular point in time. For example, the patient may have failed to attend an appointment in the clinic (Griffin, 1998).

The third type of censoring occurs because an event that is not relevant to the study happened to the object and made it impossible to follow the object up any further (e.g. the event of interest may be ‘cancer related death’ but the patient died from a car accident).

All three types of censoring are often referred to as right censoring, which is the most common form of censoring (Lawless, 1982). Another form of censoring is known as left censoring. It occurs if the status of an object is unknown at the left side of the follow-up period (e.g. the diagnosis of a disease does not necessarily mean that one knows when the disease started). If the population is both right and left censored one speaks of interval censoring.

One non-parametric approach for estimating the survival function is the Kaplan–Meier method. Its computation is summarised in Eq. (5):Ŝ(t)=j=1knjdjnj=j=1k1djnj=Ŝ(t1)1dtntHere, dj corresponds to the number of events (e.g. deaths) at time j where one or more events occurred and nj corresponds to the number of objects (e.g. patients) that are still observed at time j. In the medical domain, these patients are also called the risk set. Given the structure of Eq. (5), it is not surprising that the Kaplan–Meier method is often referred to as the Product-limit estimator. The survival function values for the leukaemia data for patients with treatment are summarised in Table 2 and for patients without treatment in Table 3.

It should be noted that Kaplan–Meier estimates could not be used directly to approximate the lifetime distribution depending upon the explanatory variables. Instead, groups containing patients with particular feature values have to be determined beforehand. Hence, the estimated survival curves are only reliable if the number of samples within each group is large and the amount of censoring is low.

The corresponding survival curves are shown in Fig. 1. Here, the solid line corresponds to the survival curve for patients who were treated, and the dashed line corresponds to the survival curve for patients without treatment.

It can clearly be seen that the survival function estimates of patients without treatment is worse than of treated patients.

Another method for estimating lifetime distributions is the proportional hazards model. It is also known as the Cox model (Cox, 1972). The hazard function of an individual is estimated using Eq. (6):hi(t;x)=h0(t)ψ(xi)Here, h0(t) is the baseline hazard and ψ(xi) is the relative hazard. The baseline hazard is the hazard for individuals with x=0. The relative hazard is a factor that makes the hazard of individual i proportional to the baseline hazard. Proportionality means that none of the survival curves of the population cross (Collet, 1994). It also means that the logarithm of the estimated hazard functions has a constant distance (Marubini and Valsecchi, 1995). In other words, they should be strictly parallel (Allison, 1997). Of course, this assumption is not very realistic as one group of patients might have higher hazard values at the beginning of the study but lower hazard values later. If the converse would occur for another group of patient, the survival curves of both groups would cross.

To model the dependency of the hazard on the feature values of an object one usually replaces the relative hazard with a function that depends on, for example, a linear combination of the features. The exponential function is used commonly to ensure that the relative hazard remains positive. This leads to Eq. (7):hi(t)=h0(t)e(β1x1i+β2x2i++βpxpi)=h0(t)e(βˆxi)The linear combination of the features is also referred to as the risk score or prognostic index. Unfortunately, the coefficients βˆ cannot be estimated using the ordinary maximum likelihood method, because the baseline hazard h0 is not defined parametrically. To estimate the coefficients βˆ, Cox put forward the partial likelihood method. This method allows the estimation of the coefficients βˆ without specifying the baseline hazard h0. The computation of the partial likelihood is summarised in Eq. (8):PL=i=1ne(βxi)j=1nYije(βxj)δiHere, Yij=1 if tjti and Yij=0 if tj<ti, where t corresponds to the time of the event or the time of censoring. This is a convenient way to define the risk set in the denominator at the time of the ith event. The exponent δi indicates whether the object/patient was censored (δi=0) or whether the event occurred (δi=1). The partial likelihood can be maximised using, for example, a version of the Newton–Raphson algorithm. It has to be noted that Eq. (8) is only valid for data without ties in which no two events occur at the same time. The computation of the exact partial likelihood for tied data can be a daunting task (see for example Allison, 1997). Methods for the approximation of the partial likelihood for tied data are discussed in Allison, 1997, Collet, 1994 and Therneau and Grambsch (2000). Unfortunately, the standard Cox model (Eq. (7)) has several shortcomings, which are summarised as follows:

  • It assumes proportional hazards;

  • The baseline hazard has to be determined in order to obtain values of the hazard/survival function for an individual;

  • The linear combination of the features cannot be used to model interaction effects;

  • The estimation of the partial likelihood is computationally very expensive when the data contain ties. This is very likely as time is often measured in a discrete domain;

  • It cannot be used (in its standard form) to model data containing time-dependent features.

These shortcomings can be alleviated using methods that are, for example, discussed in Marubini and Valsecchi (1995) and Therneau and Grambsch (2000). However, it has to be emphasised that the correct use of these methods is quite difficult, as they require extensive statistical knowledge. It is therefore not surprising that researchers are striving to develop more practical approaches, which make fewer assumptions and do not presume extensive statistical knowledge.

This paper investigates the use of a multi-objective evolutionary algorithm (MOEA) for survival analysis. A similar MOEA has already successfully been applied to several classification problems (Setzkorn and Paton, 2005). Although, there are many successful applications of evolutionary algorithms to regression and classification problems, we believe that this is the first study that uses a MOEA for survival analysis. This serves to demonstrate the versatility of evolutionary approaches for model extraction from data.

Section snippets

Existing work

Many proposed methods to overcome the shortcomings of the Cox model use the fact that the hazard corresponds to a conditional probability in the discrete time domain as shown in Eq. (9)Lawless, 1982, Willett, 1993:ĥ(tj,x)=Pr(T=tj|Ttj,x)This is in contrast to the continuous time domain in which the hazard corresponds to a rate that can have values that are greater than one. In the discrete case, the survival probabilities can be computed according to Eq. (10) for each time interval tj:Ŝ(tj,x)=k

Using a multi-objective evolutionary algorithm for survival analysis

This section describes the implemented MOEA. It begins with a brief summary of the algorithm and then describes its particular components. Fig. 2 depicts the structure of the algorithm. It is similar to other evolutionary algorithms (e.g. Michalewicz and Fogel, 2005). In broad terms, the algorithm proceeds as follows. Candidate solutions/individuals (i.e. a population of RBFNs) are initialized randomly. After this, the variation operators are applied to some of the RBFNs to recombine and/or

Results and discussions

The implemented MOEA is evaluated on several benchmark datasets. In addition, it is evaluated on a ‘real-world’ medical dataset. The benchmark datasets comprise the leukaemia dataset (Freireich et al., 1963)(see also Section 1) and four artificial datasets.

The datasets were randomly split into a training dataset and test dataset for each experiment. The training datasets contained two-thirds and the test dataset one-third of the original dataset. Before the MOEA was run a holdout dataset was

Conclusions and further work

A multi-objective evolutionary algorithm for the extraction of models for survival analysis has been proposed and evaluated. The evaluation of the MOEA on several benchmark datasets and one medical problem has shown that the approach is capable of producing accurate and valid models. The evaluation on the artificial dataset also emphasised that the approach can cope with interaction effects and noisy non-proportional hazard distributions. This is in contrast to, for example, the Cox model. In

Acknowledgements

We would like to express our gratitude to Dr. Jessica Grabham, Dr. Tony Fisher and Dr. Peter McBurney for the suggestions they made to improve the paper.

References (55)

  • E. Biganzoli et al.

    Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach

    Stat. Med.

    (1998)
  • E. Biganzoli et al.

    Modelling cause specific hazards with radial basis functions artificial neural networks: an application to 2233 breast cancer patients

    Stat. Med.

    (2001)
  • C. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Disc.

    (1998)
  • T. Clark et al.

    Survival analysis. Part IV. Further concepts and methods in survival analysis

    Brit. J. Cancer

    (2003)
  • D. Collet

    Modelling Survival Data in Medical Research

    (1994)
  • D. Cox

    Regression models and life tables (with discussion).

    J. Roy. Stat. Soc. Ser. B

    (1972)
  • N. Cramer

    A representation for the adaptive generation of simple sequential programs

  • B. Damato

    Ocular Tumours: Diagnosis and Treatment

    (2000)
  • K. Deb

    Multi-Objective Optimization using Evolutionary Algorithms

    (2001)
  • V. Dhar et al.

    Discovering interesting patterns for investment decision making with glower—a genetic learner overlaid with entropy reduction

    Data Min. Knowl. Disc.

    (2000)
  • T. Dietterich

    Overfitting and undercomputing in machine learning

    ACM Comput. Surv.

    (1995)
  • J. Elder et al.

    A statistical perspective on knowledge discovery in databases

  • M. Evans et al.

    Statistical Distributions

    (1993)
  • A. Freitas

    Data Mining and Knowledge Discovery With Evolutionary Algorithms

    (2002)
  • S. Geman et al.

    Neural networks and the bias/variance dilemma

    Neural Comput.

    (1992)
  • J. Grefenstette

    Lamarckian learning in multi-agent environments

  • Cited by (0)

    View full text