Skip to main content
Log in

Genetic programming for medical classification: a program simplification approach

  • Original Paper
  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

This paper describes a genetic programming (GP) approach to medical data classification problems. In this approach, the evolved genetic programs are simplified online during the evolutionary process using algebraic simplification rules, algebraic equivalence and prime techniques. The new simplification GP approach is examined and compared to the standard GP approach on two medical data classification problems. The results suggest that the new simplification GP approach can not only be more efficient with slightly better classification performance than the basic GP system on these problems, but also significantly reduce the sizes of evolved programs. Comparison with other methods including decision trees, naive Bayes, nearest neighbour, nearest centroid, and neural networks suggests that the new GP approach achieved superior results to almost all of these methods on these problems. The evolved genetic programs are also easier to interpret than the “hidden patterns” discovered by the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Some initial experiments were done using the simplification algorithm on two symbolic regression problems and the results are consistent with these discussions here.

  2. “Super-features” here refer to the combined expressions between if and then in Fig. 4 such as \({\tt (((F_5 + F_0 - 0.48) \times 0.383 + F_5) \times 0.0059)}.\)

References

  1. J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (University of Michigan Press, Ann Arbor, MIT Press, Cambridge, 1975)

  2. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution programs, 3rd edn. (Springer-Verlag, London, 1996)

    MATH  Google Scholar 

  3. R. Friedberg, A learning machine, Part I. IBM J. Res. Dev. 2, 2–13 (1958)

    Article  MathSciNet  Google Scholar 

  4. J.R. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection (Cambridge, MIT Press, 1992)

  5. J.R. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs (Cambridge, MIT Press, 1992)

  6. W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: an Introduction on the Automatic Evolution of Computer Programs and its Applications (Morgan Kaufmann Publishers; San Francisco, Dpunkt-verlag, Heidelburg, 1998)

  7. H. Gray, Genetic Programming for Classification of Medical Data, ed. by J.R. Koza. Late Breaking Papers at the 1997 Genetic Programming Conference (Standford University, 1997), pp. 291–297

  8. R. Poli, Genetic Programming for Image Analysis, ed. by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L. Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (Stanford University, CA, MIT Press, 1996), pp. 363–368

  9. A. Tsakonas, G. Dounias, J. Jantzen, H. Axer, B. Bjerregaard, D.G. von Keyserlingk, Evolving rule-based systems in two medical domains using genetic programming. Artif. Intell. Med. 32(3), 195–216 (2004)

    Article  Google Scholar 

  10. V. Podgorelec, Medical Diagnosis Prediction Using Genetic Programming, ed. by U.M. O’Reilly. GECCO-99 Student Workshop, Orlando, 1999, pp. 394–395

  11. S.M. Winkler, M. Affenzeller, S. Wagner, Using Enhanced Genetic Programming Techniques for Evolving Classifiers in the Context of Medical Diagnosis—an Empirical Study, ed. By S.L. Smith, S. Cagnoni, J. van Hemert. MedGEC 2006 GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, Seattle, USA, 2006

  12. M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)

    Article  Google Scholar 

  13. C. Bojarczuk, H. Lopes, A. Freitas, E. Michalkiewicz, A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Art. Intel. Med. 30, 27–48 (2004)

    Article  Google Scholar 

  14. C. Bojarczuk, H. Lopes, A. Freitas, Discovering Comprehensible Classification Rules Using Genetic Programming: A Case Study in a Medical Domain, ed. by W. Banzhaf et al. Proceedings of Genetic and Evolutionary Computation Conference (GECCO-99) (Morgan Kaufmann, Orlando, USA, 1999, pp. 953–958

  15. M. Zhang, V. Ciesielski, P. Andreae, A domain independent window-approach to multiclass object detection using genetic programming. EURASIP J. Signal Process. Special Issue Gen. Evol. Comput. Sig. Proces. Image Anal. 2003(8), 841–859 (2003)

    MATH  Google Scholar 

  16. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)

    MATH  Google Scholar 

  17. T. Soule, J.A. Foster, J. Dickinson, Code Growth in Genetic Programming, ed. by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L. Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (MIT Press, Stanford University, CA, 1996), pp. 215–223

  18. T. Blickle, L. Thiele, Genetic Programming and Redundancy, ed. by J. Hopf. Genetic Algorithms within the Framework of Evolutionary Computation (Workshop at KI-94, Saarbrücken, Im Stadtwald, Building 44, D-66123 Saarbrücken, Germany, Max-Planck-Institut für Informatik (MPI-I-94-241), 1994), pp. 33–38

  19. P. Nordin, W. Banzhaf, Complexity Compression and Evolution, ed. by L. Eshelman. Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95) (Morgan Kaufmann, Pittsburgh, PA, USA, 1995), pp. 310–317

  20. D. Jackson, Fitness Evaluation Avoidance in Boolean GP Problems, ed. by D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T.K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J.J.M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L.G. Volkert, D. Ashlock, M. Schoenauer. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol 3 (IEEE Press, Edinburgh, UK, 2005), pp. 2530–2536

  21. B.T. Zhang, H. Mühlenbein, Balancing accuracy and parsimony in genetic programming. Evol. Comput. 3(1), 17–38 (1995)

    Article  Google Scholar 

  22. T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evol. Comput. 6(4), 293–309 (1998)

    Article  Google Scholar 

  23. S. Luke, L. Panait, Lexicographic Parsimony Pressure, ed. by W.B. Langdon, E. Cantú-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M.A. Potter, A.C. Schultz, J.F. Miller, E. Burke, N. Jonoska. GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference (Morgan Kaufmann Publishers, New York, 2002), pp. 829–836

  24. A. Piszcz, T. Soule, Dynamics of Evolutionary Robustness, ed. by M. Keijzer, M. Cattolico, D. Arnold, V. Babovic, C. Blum, P. Bosman, M.V. Butz, C. Coello Coello, D. Dasgupta, S.G. Ficici, J. Foster, A. Hernandez-Aguirre, G. Hornby, H. Lipson, P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan, D. Thierens. GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, vol. 1 (ACM Press, Seattle, Washington, USA, 2006), pp. 871–878

  25. M. Zhang, U. Bhowan, Program Size and Pixel Statistics in Genetic Programming for Object Detection, ed. by G.R. Raidl, S. Cagnoni, J. Branke, D.W. Corne, R. Drechsler, Y. Jin, C.R. Johnson, P. Machado, E. Marchiori, F. Rothlauf, G.D. Smith, G. Squillero. Applications of Evolutionary Computing, EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoMUSART, EvoSTOC, vol. 3005 of LNCS. (Springer Verlag, Coimbra, Portugal), pp. 379–388

  26. S. Gustafson, A. Ekart, E. Burke, G. Kendall, Problem difficulty and code growth in genetic programming. Genet. Prog. Evol. Mach. 5(3), 271–290 (2004)

    Article  Google Scholar 

  27. W.B. Langdon, Quadratic Bloat in Genetic Programming, D. Whitley, D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, H.G. Beyer. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000) (Las Vegas, Nevada, USA, Morgan Kaufmann, 2000), pp. 451–458

  28. M.J. Streeter, The Root Causes of Code Growth in Genetic Programming, ed. by C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa. Genetic Programming, Proceedings of EuroGP’2003, vol. 2610 of LNCS (Springer-Verlag, Essex, 2003), pp. 443–454

  29. P. Nordin, F. Francone, W. Banzhaf, Explicitly Defined Introns and Destructive Crossover in Genetic Programming, ed. by J.P. Rosca. Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications (Tahoe City, California, USA, 1995), pp. 6–22

  30. X. Zhong, T. Soule, Growth of Self-canceling Code in Evolutionary Systems, ed. by M. Keijzer, M. Cattolico, D. Arnold, V. Babovic, C. Blum, P. Bosman, M.V. Butz, C. Coello Coello, D. Dasgupta, S.G. Ficici, J. Foster, A. Hernandez-Aguirre, G. Hornby, H. Lipson, P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan, D. Thierens. GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, vol. 1 (ACM Press, Seattle, Washington, USA, 2006), pp. 223–228

  31. W. Ashlock, D. Ashlock, Single Parent Genetic Programming, ed. by D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T.K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J.J.M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L.G. Volkert, D. Ashlock, M. Schoenauer. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol. 2 (IEEE Press, Edinburgh, UK, 2005), pp. 1172–1179

  32. W.B. Langdon, R. Poli, Fitness Causes Bloat: Mutation, ed. by W. Banzhaf, R. Poli, M. Schoenauer, T.C. Fogarty. Proceedings of the First European Workshop on Genetic Programming, vol. 1391 of LNCS (Springer-Verlag, Paris, 1998), pp. 37–48

  33. W.B. Langdon, R. Poli, Fitness Causes Bloat, Soft Computing in Engineering Design and Manufacturing, ed. by P.K. Chawdhry, R. Roy, R.K. Pant (Springer-Verlag, London, 1997), pp. 13–22

    Google Scholar 

  34. D. Hooper, N.S. Flann, Improving the Accuracy and Robustness of Genetic Programming Through Expression Simplification, ed by J.R. Koza, D.E. Goldberg, D.B. Fogel, R.L Riolo. Genetic Programming 1996: Proceedings of the First Annual Conference (MIT Press, Stanford University, CA, USA, 1996), p. 428

  35. A. Ekart, Shorter Fitness Preserving Genetic Programs, ed. by C. Fonlupt, J.K Hao, E. Lutton, E. Ronald, M. Schoenauer. Artificial Evolution. 4th European Conference, AE’99, Selected Papers, vol. 1829 of LNCS (Dunkerque, France, 2000), pp. 73–83

  36. J.F. Smith III, Genetic Program Based Data Mining for Fuzzy Decision Trees, ed. by Z.R. Yang, R.M. Everson, H. Yin. Intelligent Data Engineering and Automated Learning—IDEAL 2004, 5th International Conference, Proceedings, vol. 3177 of Lecture Notes in Computer Science (Springer, Exeter, UK, 2004), pp. 464–470

  37. M. Zhang, Y. Zhang, W.D. Smart, Program Simplification in Genetic Programming for Object Classification, ed. by R. Khosla, R.J. Howlett, L.C. Jain. Knowledge-based Intelligent Information and Engineering Systems, 9th International Conference, KES 2005, Proceedings, Part III, vol. 3683 of Lecture Notes in Computer Science (Springer, Melbourne, Australia, 2005), pp. 988–996

  38. M. Brameier, W. Banzhaf, A Comparison of Genetic Programming and Neural Networks in Medical Data Analysis. Reihe CI 43/98, SFB 531. (Dortmund University, Germany, 1998)

    Google Scholar 

  39. D. Parrott, X. Li, V. Ciesielski, Multi-objective Techniques in Genetic Programming for Evolving Classifiers, ed. by D. Corne, M. Zbigniew. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol. 2 (Edinburgh, UK, 2005), pp. 1141–1148

  40. S. Winkler, M. Affenzeller, S. Wagner, A Genetic Programming Based tool for Supporting Bioinformatical Classication Problems. Proceedings of the FH Science Day 2005 (Shaker Verlag, 2005), pp. 3–10

  41. M.G. Smith, L. Bull, Feature Construction and Selection Using Genetic Programming and a Genetic Algorithm, ed. by C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa. Genetic Programming, Proceedings of EuroGP’2003, vol. 2610 of LNCS. (Springer-Verlag, Essex, 2003), pp. 229–237

  42. H.A. Abbass, An evolutionary artificial neural networks approach for breast cancer diagnosis. Art. Intel. Med. 25(3), 265–281 (2002)

    Article  Google Scholar 

  43. C.A. Pena-Reyes, M. Sipper, Applying fuzzy coco to breast cancer diagnosis (2001)

  44. H. Mallinson, P. Bentley, Evolving fuzzy rules for pattern classification. In Computational Integration for Modelling, Control and Automation ’99, ed. by M. Mohammadian. vol. 1 (IOS Press, Hotel Marriott, Vienna, Austria, 1999)

    Google Scholar 

  45. R.S. Parpinelli, H.S. Lopes, A.A. Freitas, An Ant Colony Based System for Data Mining: Applications to Medical Data, ed. by L. Spector, E.D. Goodman, A. Wu, W.B. Langdon, H.M., Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M.H. Garzon, E. Burke. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (Morgan Kaufmann, San Francisco, California, USA, 2001), pp. 791–797

  46. M.G. Madden, Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR cs.LG/0211003 (2002)

  47. W.A. Tackett, Genetic Programming for Feature Discovery and image Discrimination, ed. by S. Forrest. Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93 (Morgan Kaufmann, University of Illinois at Urbana-Champaign, 1993), pp. 303–309

  48. M. Zhang, V. Ciesielski, Genetic Programming for Multiple Class Object Detection, ed. by N. Foo. Proceedings of the 12th Australian Joint Conference on Artificial Intelligence (AI’99), Lecture Notes in Artificial Intelligence, LNAI vol. 1747 (Springer-Verlag, Sydney, Heidelberg, Berlin, 1999), pp. 180–192

  49. T. Loveard, V. Ciesielski, Representing Classification Problems in Genetic Programming. Proceedings of the Congress on Evolutionary Computation, vol. 2 (IEEE Press, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 2001), pp. 1070–1077

  50. R. Fikes, N. Nilsson, Strips: A new approach to the application of theorem proving to problem solving. Art. Intel. 2, 189–208 (1971)

    Article  MATH  Google Scholar 

  51. W.A. Martin, Determining the equivalence of algebraic expressions by hash coding. j-J-ACM 18(4), 549–558 (1971)

    Article  MATH  Google Scholar 

  52. G.H. Gonnet, Determining Equivalence of Expressions in Random Polynomial Time. STOC ’84: Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing (ACM Press, New York, NY, USA, 1984), pp. 334–341

  53. R. Lidl, H. Niederreiter, Introduction to Finite Fields and Their Applications (Cambridge University Press, New York, NY, USA, 1986)

    MATH  Google Scholar 

  54. W. Trappe, L.C. Washington, Introduction to Cryptograpy with Coding Theory, 2nd edn. (Prentice-Hall, 2006)

  55. B. Cherowitzo (2006) Lecture Notes. http://www-math.cudenver.edu/wcherowi/courses/m5410/exeucalg.html. Accessed 7 January 2006

  56. D. Newman, S. Hettich, C. Blake, C. Merz, Uci repository of machine learning databases (1998)

  57. O. Mangasarian, W.H. Wolberg, Cancer diagnosis via linear programming. SIAM News 23(5). 1–18 (1990)

    Google Scholar 

  58. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Internal Representations by Error Propagation. ed. by D.E. Rumelhart, J.L. McClelland, The PDP Research Group. Parallel Distributed Processing, Explorations in the Microstructure of Cognition, vol. 1, Foundations (The MIT Press, MA, 1986)

  59. A. Zell, G. Zell, et al. SNNS User Manual, Version 4.1 (University of Stuttgart, 1995)

  60. J.R. Quinlan, C4.5: Programs for Machine Learning. (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993)

    Google Scholar 

  61. T. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Marsden Fund at Royal Society of New Zealand under grant No. 05-VUW-017 and University Research Fund 7/39 at Victoria University of Wellington.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengjie Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Wong, P. Genetic programming for medical classification: a program simplification approach. Genet Program Evolvable Mach 9, 229–255 (2008). https://doi.org/10.1007/s10710-008-9059-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-008-9059-9

Keywords

Navigation