Abstract
This paper describes a genetic programming (GP) approach to medical data classification problems. In this approach, the evolved genetic programs are simplified online during the evolutionary process using algebraic simplification rules, algebraic equivalence and prime techniques. The new simplification GP approach is examined and compared to the standard GP approach on two medical data classification problems. The results suggest that the new simplification GP approach can not only be more efficient with slightly better classification performance than the basic GP system on these problems, but also significantly reduce the sizes of evolved programs. Comparison with other methods including decision trees, naive Bayes, nearest neighbour, nearest centroid, and neural networks suggests that the new GP approach achieved superior results to almost all of these methods on these problems. The evolved genetic programs are also easier to interpret than the “hidden patterns” discovered by the other methods.
Similar content being viewed by others
Notes
Some initial experiments were done using the simplification algorithm on two symbolic regression problems and the results are consistent with these discussions here.
“Super-features” here refer to the combined expressions between if and then in Fig. 4 such as \({\tt (((F_5 + F_0 - 0.48) \times 0.383 + F_5) \times 0.0059)}.\)
References
J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (University of Michigan Press, Ann Arbor, MIT Press, Cambridge, 1975)
Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution programs, 3rd edn. (Springer-Verlag, London, 1996)
R. Friedberg, A learning machine, Part I. IBM J. Res. Dev. 2, 2–13 (1958)
J.R. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection (Cambridge, MIT Press, 1992)
J.R. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs (Cambridge, MIT Press, 1992)
W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: an Introduction on the Automatic Evolution of Computer Programs and its Applications (Morgan Kaufmann Publishers; San Francisco, Dpunkt-verlag, Heidelburg, 1998)
H. Gray, Genetic Programming for Classification of Medical Data, ed. by J.R. Koza. Late Breaking Papers at the 1997 Genetic Programming Conference (Standford University, 1997), pp. 291–297
R. Poli, Genetic Programming for Image Analysis, ed. by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L. Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (Stanford University, CA, MIT Press, 1996), pp. 363–368
A. Tsakonas, G. Dounias, J. Jantzen, H. Axer, B. Bjerregaard, D.G. von Keyserlingk, Evolving rule-based systems in two medical domains using genetic programming. Artif. Intell. Med. 32(3), 195–216 (2004)
V. Podgorelec, Medical Diagnosis Prediction Using Genetic Programming, ed. by U.M. O’Reilly. GECCO-99 Student Workshop, Orlando, 1999, pp. 394–395
S.M. Winkler, M. Affenzeller, S. Wagner, Using Enhanced Genetic Programming Techniques for Evolving Classifiers in the Context of Medical Diagnosis—an Empirical Study, ed. By S.L. Smith, S. Cagnoni, J. van Hemert. MedGEC 2006 GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, Seattle, USA, 2006
M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)
C. Bojarczuk, H. Lopes, A. Freitas, E. Michalkiewicz, A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Art. Intel. Med. 30, 27–48 (2004)
C. Bojarczuk, H. Lopes, A. Freitas, Discovering Comprehensible Classification Rules Using Genetic Programming: A Case Study in a Medical Domain, ed. by W. Banzhaf et al. Proceedings of Genetic and Evolutionary Computation Conference (GECCO-99) (Morgan Kaufmann, Orlando, USA, 1999, pp. 953–958
M. Zhang, V. Ciesielski, P. Andreae, A domain independent window-approach to multiclass object detection using genetic programming. EURASIP J. Signal Process. Special Issue Gen. Evol. Comput. Sig. Proces. Image Anal. 2003(8), 841–859 (2003)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
T. Soule, J.A. Foster, J. Dickinson, Code Growth in Genetic Programming, ed. by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L. Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (MIT Press, Stanford University, CA, 1996), pp. 215–223
T. Blickle, L. Thiele, Genetic Programming and Redundancy, ed. by J. Hopf. Genetic Algorithms within the Framework of Evolutionary Computation (Workshop at KI-94, Saarbrücken, Im Stadtwald, Building 44, D-66123 Saarbrücken, Germany, Max-Planck-Institut für Informatik (MPI-I-94-241), 1994), pp. 33–38
P. Nordin, W. Banzhaf, Complexity Compression and Evolution, ed. by L. Eshelman. Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95) (Morgan Kaufmann, Pittsburgh, PA, USA, 1995), pp. 310–317
D. Jackson, Fitness Evaluation Avoidance in Boolean GP Problems, ed. by D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T.K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J.J.M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L.G. Volkert, D. Ashlock, M. Schoenauer. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol 3 (IEEE Press, Edinburgh, UK, 2005), pp. 2530–2536
B.T. Zhang, H. Mühlenbein, Balancing accuracy and parsimony in genetic programming. Evol. Comput. 3(1), 17–38 (1995)
T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evol. Comput. 6(4), 293–309 (1998)
S. Luke, L. Panait, Lexicographic Parsimony Pressure, ed. by W.B. Langdon, E. Cantú-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M.A. Potter, A.C. Schultz, J.F. Miller, E. Burke, N. Jonoska. GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference (Morgan Kaufmann Publishers, New York, 2002), pp. 829–836
A. Piszcz, T. Soule, Dynamics of Evolutionary Robustness, ed. by M. Keijzer, M. Cattolico, D. Arnold, V. Babovic, C. Blum, P. Bosman, M.V. Butz, C. Coello Coello, D. Dasgupta, S.G. Ficici, J. Foster, A. Hernandez-Aguirre, G. Hornby, H. Lipson, P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan, D. Thierens. GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, vol. 1 (ACM Press, Seattle, Washington, USA, 2006), pp. 871–878
M. Zhang, U. Bhowan, Program Size and Pixel Statistics in Genetic Programming for Object Detection, ed. by G.R. Raidl, S. Cagnoni, J. Branke, D.W. Corne, R. Drechsler, Y. Jin, C.R. Johnson, P. Machado, E. Marchiori, F. Rothlauf, G.D. Smith, G. Squillero. Applications of Evolutionary Computing, EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoMUSART, EvoSTOC, vol. 3005 of LNCS. (Springer Verlag, Coimbra, Portugal), pp. 379–388
S. Gustafson, A. Ekart, E. Burke, G. Kendall, Problem difficulty and code growth in genetic programming. Genet. Prog. Evol. Mach. 5(3), 271–290 (2004)
W.B. Langdon, Quadratic Bloat in Genetic Programming, D. Whitley, D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, H.G. Beyer. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000) (Las Vegas, Nevada, USA, Morgan Kaufmann, 2000), pp. 451–458
M.J. Streeter, The Root Causes of Code Growth in Genetic Programming, ed. by C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa. Genetic Programming, Proceedings of EuroGP’2003, vol. 2610 of LNCS (Springer-Verlag, Essex, 2003), pp. 443–454
P. Nordin, F. Francone, W. Banzhaf, Explicitly Defined Introns and Destructive Crossover in Genetic Programming, ed. by J.P. Rosca. Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications (Tahoe City, California, USA, 1995), pp. 6–22
X. Zhong, T. Soule, Growth of Self-canceling Code in Evolutionary Systems, ed. by M. Keijzer, M. Cattolico, D. Arnold, V. Babovic, C. Blum, P. Bosman, M.V. Butz, C. Coello Coello, D. Dasgupta, S.G. Ficici, J. Foster, A. Hernandez-Aguirre, G. Hornby, H. Lipson, P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan, D. Thierens. GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, vol. 1 (ACM Press, Seattle, Washington, USA, 2006), pp. 223–228
W. Ashlock, D. Ashlock, Single Parent Genetic Programming, ed. by D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T.K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J.J.M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L.G. Volkert, D. Ashlock, M. Schoenauer. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol. 2 (IEEE Press, Edinburgh, UK, 2005), pp. 1172–1179
W.B. Langdon, R. Poli, Fitness Causes Bloat: Mutation, ed. by W. Banzhaf, R. Poli, M. Schoenauer, T.C. Fogarty. Proceedings of the First European Workshop on Genetic Programming, vol. 1391 of LNCS (Springer-Verlag, Paris, 1998), pp. 37–48
W.B. Langdon, R. Poli, Fitness Causes Bloat, Soft Computing in Engineering Design and Manufacturing, ed. by P.K. Chawdhry, R. Roy, R.K. Pant (Springer-Verlag, London, 1997), pp. 13–22
D. Hooper, N.S. Flann, Improving the Accuracy and Robustness of Genetic Programming Through Expression Simplification, ed by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (MIT Press, Stanford University, CA, USA, 1996), p. 428
A. Ekart, Shorter Fitness Preserving Genetic Programs, ed. by C. Fonlupt, J.K Hao, E. Lutton, E. Ronald, M. Schoenauer. Artificial Evolution. 4th European Conference, AE’99, Selected Papers, vol. 1829 of LNCS (Dunkerque, France, 2000), pp. 73–83
J.F. Smith III, Genetic Program Based Data Mining for Fuzzy Decision Trees, ed. by Z.R. Yang, R.M. Everson, H. Yin. Intelligent Data Engineering and Automated Learning—IDEAL 2004, 5th International Conference, Proceedings, vol. 3177 of Lecture Notes in Computer Science (Springer, Exeter, UK, 2004), pp. 464–470
M. Zhang, Y. Zhang, W.D. Smart, Program Simplification in Genetic Programming for Object Classification, ed. by R. Khosla, R.J. Howlett, L.C. Jain. Knowledge-based Intelligent Information and Engineering Systems, 9th International Conference, KES 2005, Proceedings, Part III, vol. 3683 of Lecture Notes in Computer Science (Springer, Melbourne, Australia, 2005), pp. 988–996
M. Brameier, W. Banzhaf, A Comparison of Genetic Programming and Neural Networks in Medical Data Analysis. Reihe CI 43/98, SFB 531. (Dortmund University, Germany, 1998)
D. Parrott, X. Li, V. Ciesielski, Multi-objective Techniques in Genetic Programming for Evolving Classifiers, ed. by D. Corne, M. Zbigniew. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol. 2 (Edinburgh, UK, 2005), pp. 1141–1148
S. Winkler, M. Affenzeller, S. Wagner, A Genetic Programming Based tool for Supporting Bioinformatical Classication Problems. Proceedings of the FH Science Day 2005 (Shaker Verlag, 2005), pp. 3–10
M.G. Smith, L. Bull, Feature Construction and Selection Using Genetic Programming and a Genetic Algorithm, ed. by C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa. Genetic Programming, Proceedings of EuroGP’2003, vol. 2610 of LNCS. (Springer-Verlag, Essex, 2003), pp. 229–237
H.A. Abbass, An evolutionary artificial neural networks approach for breast cancer diagnosis. Art. Intel. Med. 25(3), 265–281 (2002)
C.A. Pena-Reyes, M. Sipper, Applying fuzzy coco to breast cancer diagnosis (2001)
H. Mallinson, P. Bentley, Evolving fuzzy rules for pattern classification. In Computational Integration for Modelling, Control and Automation ’99, ed. by M. Mohammadian. vol. 1 (IOS Press, Hotel Marriott, Vienna, Austria, 1999)
R.S. Parpinelli, H.S. Lopes, A.A. Freitas, An Ant Colony Based System for Data Mining: Applications to Medical Data, ed. by L. Spector, E.D. Goodman, A. Wu, W.B. Langdon, H.M., Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M.H. Garzon, E. Burke. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (Morgan Kaufmann, San Francisco, California, USA, 2001), pp. 791–797
M.G. Madden, Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR cs.LG/0211003 (2002)
W.A. Tackett, Genetic Programming for Feature Discovery and image Discrimination, ed. by S. Forrest. Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93 (Morgan Kaufmann, University of Illinois at Urbana-Champaign, 1993), pp. 303–309
M. Zhang, V. Ciesielski, Genetic Programming for Multiple Class Object Detection, ed. by N. Foo. Proceedings of the 12th Australian Joint Conference on Artificial Intelligence (AI’99), Lecture Notes in Artificial Intelligence, LNAI vol. 1747 (Springer-Verlag, Sydney, Heidelberg, Berlin, 1999), pp. 180–192
T. Loveard, V. Ciesielski, Representing Classification Problems in Genetic Programming. Proceedings of the Congress on Evolutionary Computation, vol. 2 (IEEE Press, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 2001), pp. 1070–1077
R. Fikes, N. Nilsson, Strips: A new approach to the application of theorem proving to problem solving. Art. Intel. 2, 189–208 (1971)
W.A. Martin, Determining the equivalence of algebraic expressions by hash coding. j-J-ACM 18(4), 549–558 (1971)
G.H. Gonnet, Determining Equivalence of Expressions in Random Polynomial Time. STOC ’84: Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing (ACM Press, New York, NY, USA, 1984), pp. 334–341
R. Lidl, H. Niederreiter, Introduction to Finite Fields and Their Applications (Cambridge University Press, New York, NY, USA, 1986)
W. Trappe, L.C. Washington, Introduction to Cryptograpy with Coding Theory, 2nd edn. (Prentice-Hall, 2006)
B. Cherowitzo (2006) Lecture Notes. http://www-math.cudenver.edu/wcherowi/courses/m5410/exeucalg.html. Accessed 7 January 2006
D. Newman, S. Hettich, C. Blake, C. Merz, Uci repository of machine learning databases (1998)
O. Mangasarian, W.H. Wolberg, Cancer diagnosis via linear programming. SIAM News 23(5). 1–18 (1990)
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Internal Representations by Error Propagation. ed. by D.E. Rumelhart, J.L. McClelland, The PDP Research Group. Parallel Distributed Processing, Explorations in the Microstructure of Cognition, vol. 1, Foundations (The MIT Press, MA, 1986)
A. Zell, G. Zell, et al. SNNS User Manual, Version 4.1 (University of Stuttgart, 1995)
J.R. Quinlan, C4.5: Programs for Machine Learning. (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993)
T. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)
Acknowledgements
This work was supported in part by the Marsden Fund at Royal Society of New Zealand under grant No. 05-VUW-017 and University Research Fund 7/39 at Victoria University of Wellington.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, M., Wong, P. Genetic programming for medical classification: a program simplification approach. Genet Program Evolvable Mach 9, 229–255 (2008). https://doi.org/10.1007/s10710-008-9059-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-008-9059-9