Abstract
In certain problem domains, “The Curse of Dimensionality” (Hastie et al., 2001) is well known. Also known as the problem of “High P and Low N” where the number of parameters far exceeds the number of samples to learn from, we describe our methods for making the most of limited samples in producing reasonably general classification rules from data with a larger number of parameters. We discuss the application of this approach in classifying mesothelioma samples from baseline data according to their time to recurrence. In this case there are 12,625 inputs for each sample but only 19 samples to learn from. We reflect on the theoretical implications of the behavior of GP in these extreme cases and speculate on the nature of generality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Affymetrix (2006). Human genome u95 set.
Almal, A., Mitra, A., Datar, R., Lenehan, P., Fry, D., Cote, R., and Worzel, W. (2006). Using genetic programming to classify node positive patients in bladder cancer. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2006).
Daida, Jason (2004). Considering the roles of structure in problem solving by a computer. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 5, pages 67–86. Springer, Ann Arbor.
Driscoll, Joseph A., Worzel, Bill, and MacLean, Duncan (2003). Classification of gene expression data with genetic programming. In Riolo, Rick L. and Worzel, Bill, editors, Genetic Programming Theory and Practice, chapter 3, pages 25–42. Kluwer.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. In Springer series in statistics. Springer, Berlin.
Holland, J.H. (2003). Personal communication.
Hong, Jin-Hyuk and Cho, Sung Bae (2004). Lymphoma cancer classification using genetic programming with SNR features. In Keijzer, Maarten, O’Reilly, Una-May, Lucas, Simon M., Costa, Ernesto, and Soule, Terence, editors, Genetic Programming 7th European Conference, EuroGP 2004, Proceedings, volume 3003 of LNCS, pages 78–88, Coimbra, Portugal. Springer-Verlag.
Langdon, W. and Buxton, B. (2004). Genetic programming for mining dna chip data from cancer patients. Genetic Programming and Evolvable Machines, 5(3):251–257.
MacLean, Duncan, Wollesen, Eric A., and Worzel, Bill (2004). Listening to data: Tuning a genetic programming system. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 15, pages 245–262. Springer, Ann Arbor.
Moore, Jason H., Parker, Joel S., Olsen, Nancy J., and Aune, Thomas M. (2002). Symbolic discriminant analysis of microarray data in automimmune disease. Genetic Epidemiology, 23:57–69.
Pass, H.I., Liu, Z., Wali, A., Bueno, R., Land, S., Lott, D., Siddiq, F., Lonardo, F., Carbone, M., and Draghid, S. (2004). Gene expression profiles predict survival and progression of pleural mesothelioma. Clinical Cancer Research, 10(3):849–859.
Poli, R. (2000). Hyperschema theory for gp with one-point crossover, building blocks, and some new results in ga theory. In Proceedings of Euro GP’2000, LNCS, pages 163–180. Springer-Verlag.
Poli, Riccardo and Langdon, W. B. (1997). Genetic programming with one-point crossover and point mutation. Technical Report CSRP-97-13, University of Birmingham, School of Computer Science, Birmingham, B15 2TT, UK.
Sastry, Kumara, O’Reilly, Una-May, and Goldberg, David E. (2004). Population sizing for genetic programming based on decision making. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 4, pages 49–65. Springer, Ann Arbor.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Worzel, W.P., Almal, A., MacLean, C.D. (2007). Lifting the Curse of Dimensionality. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-49650-4_3
Download citation
DOI: https://doi.org/10.1007/978-0-387-49650-4_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-33375-5
Online ISBN: 978-0-387-49650-4
eBook Packages: Computer ScienceComputer Science (R0)