Elsevier

Neoplasia

Volume 9, Issue 4, April 2007, Pages 292-303, IN1-IN3
Neoplasia

Feature Selection and Molecular Classification of Cancer Using Genetic Programming

https://doi.org/10.1593/neo.07121Get rights and content
Under a Creative Commons license
open access

Abstract

Despite important advances in microarray-based molecular classification of tumors, its application in clinical settings remains formidable. This is in part due to the limitation of current analysis programs in discovering robust biomarkers and developing classifiers with a practical set of genes. Genetic programming (GP) is a type of machine learning technique that uses evolutionary algorithm to simulate natural selection as well as population dynamics, hence leading to simple and comprehensible classifiers. Here we applied GP to cancer expression profiling data to select feature genes and build molecular classifiers by mathematical integration of these genes. Analysis of thousands of GP classifiers generated for a prostate cancer data set revealed repetitive use of a set of highly discriminative feature genes, many of which are known to be disease associated. GP classifiers often comprise five or less genes and successfully predict cancer types and subtypes. More importantly, GP classifiers generated in one study are able to predict samples from an independent study, which may have used different microarray platforms. In addition, GP yielded classification accuracy better than or similar to conventional classification methods. Furthermore, the mathematical expression of GP classifiers provides insights into relationships between classifier genes. Taken together, our results demonstrate that GP may be valuable for generating effective classifiers containing a practical set of genes for diagnostic/ prognostic cancer classification.

Keywords

Molecular diagnostics
biomarkers
prostate cancer
evolutionary algorithm
microarray profiling

Cited by (0)

This research was supported in part by the National Institutes of Health (R01 CA97063 to A.M.C. and D.G., U54 DA021519-01 A1 to A.M.C., Prostate SPORE P50CA69568 to A.M.C.), the Early Detection Research Network (UO1 CA111275 to A.M.C. and D.G.), the National Institutes of General Medical Sciences (GM 72007 to D.G.), the Department of Defense (W81XWH-06-1-0224 to A.M.C., PC060266 to J.Y), and the Cancer Center Bioinformatics Core (support grant 5P30 CA46592 to A.M.C.). A.M.C. is supported by a Clinical Translational Research Award from the Burroughs Welcome Foundation.