abstract = "In this paper we discuss heterogeneous estimation
model ensembles for cancer diagnoses produced using
various machine learning algorithms. Based on patients'
data records including standard blood parameters,
tumour markers, and information about the diagnosis of
tumors, the goal is to identify mathematical models for
estimating cancer diagnoses. Several machine learning
approaches implemented in HeuristicLab and WEKA have
been applied for identifying estimators for selected
cancer diagnoses: k-nearest neighbour learning,
decision trees, artificial neural networks, support
vector machines, random forests, and genetic
programming. The models produced using these methods
have been combined to heterogeneous model ensembles.
All models trained during the learning phase are
applied during the test phase; the final classification
is annotated with a confidence value that specifies how
reliable the models are regarding the presented
decision: We calculate the final estimation for each
sample via majority voting, and the relative ratio of a
sample's majority vote is used for calculating the
confidence in the final estimation. We use a confidence
threshold that specifies the minimum confidence level
that has to be reached; if this threshold is not
reached for a sample, then there is no prediction for
that specific sample.
As we show in the results section, the accuracies of
diagnoses of breast cancer, melanoma, and respiratory
system cancer can so be increased significantly. We see
that increasing the confidence threshold leads to
higher classification accuracies, bearing in mind that
the ratio of samples, for which there is a
classification statement, is significantly decreased.",