title = "Prediction of cancer class with majority voting
genetic programming classifier using gene expression
data",
journal = "IEEE/ACM Transactions on Computational Biology and
Bioinformatics",
year = "2009",
month = apr # "-" # jun,
volume = "6",
number = "2",
pages = "353--367",
keywords = "genetic algorithms, genetic programming, Classifier
design and evaluation, Data mining, Feature extraction
or construction, Evolutionary computing, AdaBoost.M1,
kNN, SVM, RPMBGA, EGPC Java, MVGPC",
ISSN = "1545-5963",
DOI = "doi:10.1109/TCBB.2007.70245",
size = "14 pages",
abstract = "In order to get a better understanding of different
types of cancers and to find the possible biomarkers
for diseases, recently many researchers are analysing
the gene expression data using various machine learning
techniques. However, due to smaller number of training
samples compared to huge number of genes and class
imbalance, most of these methods suffer from
over-fitting. In this article, we present a majority
voting genetic programming classifier (MVGPC) for
classification of microarray data. Instead of a single
rule or a single set of rules, we evolve multiple rules
with genetic programming and then apply those rules to
test samples to determine their labels with majority
voting technique. By performing experiments on four
different public cancer data sets, including multiclass
data sets, we have found that the test accuracies of
MVGPC are better than those of other methods including
AdaBoost with genetic programming. Moreover, some of
the more frequently occurring genes in the
classification rules are known to be associated with
the types of cancers being studied in this article.",
notes = "4 genechip datasets (brain cancer prostate cancer,
breast cancer, lung carcinoma) Small sample size 50,
102, 22, 203. Preprocessing reduces to 4434, 5966,
3226, 3312 genes. affymetrix software (gene present(P),
missing(M), or A (unknown?) pop=4000, max rule
size=100, elitism. GP classifiers combined externally
by fixed rule (ie majority voting). MVGPC
multiclass=multiple one vs rest. point mutation,
overfitting, log transforms. {"}it may not matter
whether the data are normalised or not.{"} No differece
(at 5percent) between scaled and non-scaled.
Also known as \cite{10.1109/TCBB.2007.70245}
\cite{4359894}",