Skip to main content

SLUG: Feature Selection Using Genetic Algorithms and Genetic Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13223))

Abstract

We present SLUG, a method that uses genetic algorithms as a wrapper for genetic programming (GP), to perform feature selection while inducing models. This method is first tested on four regular binary classification datasets, and then on 10 synthetic datasets produced by GAMETES, a tool for embedding epistatic gene-gene interactions into noisy datasets. We compare the results of SLUG with the ones obtained by other GP-based methods that had already been used on the GAMETES problems, concluding that the proposed approach is very successful, particularly on the epistatic datasets. We discuss the merits and weaknesses of SLUG and its various parts, i.e. the wrapper and the learner, and we perform additional experiments, aimed at comparing SLUG with other state-of-the-art learners, like decision trees, random forests and extreme gradient boosting. Despite the fact that SLUG is not the most efficient method in terms of training time, it is confirmed as the most effective method in terms of accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/jespb/Python-STGP and https://github.com/jespb/Python-M3GP.

  2. 2.

    We performed 30 runs using the same total number of comparisons as SLUG using the STGP (10000 individuals and 1500 generations). With this, the median test accuracy achieved was 0.4982, while the best was 0.5348.

References

  1. Aguirre, H.E., Tanaka, K.: Genetic algorithms on NK-landscapes: effects of selection, drift, mutation, and recombination. In: Cagnoni, S., et al. (eds.) Applications of Evolutionary Computing, pp. 131–142. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-78761-7

  2. Altenberg, L.: B2.7.2. NK fitness landscapes. In: Handbook of Evolutionary Computation. pp. B2.7:5–B2.7:10. IOP Publishing Ltd. and Oxford University Press, London (1997)

    Google Scholar 

  3. Ansarifar, J., Wang, L.: New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 35(24), 5078–5085 (2019). https://doi.org/10.1093/bioinformatics/btz463

  4. Chaikla, N., Qi, Y.: Genetic algorithms in feature selection. In: IEEE SMC 1999 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 99CH37028). vol. 5, pp. 538–540 (1999). https://doi.org/10.1109/ICSMC.1999.815609

  5. Chan, K., Aydin, M., Fogarty, T.: An epistasis measure based on the analysis of variance for the real-coded representation in genetic algorithms. In: The 2003 Congress on Evolutionary Computation, 2003, CEC 2003. vol. 1, pp. 297–304 (2003). https://doi.org/10.1109/CEC.2003.1299588

  6. Chiesa, M., Maioli, G., Colombo, G.: GARS: Genetic algorithm for the identification of a robust subset of features in high-dimensional datasets. BMC Bioinform. 21(54) (2020). https://doi.org/10.1186/s12859-020-3400-6

  7. Cordell, H.J.: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum. Mol. Gene. 11(20), 2463–2468 (2002). https://doi.org/10.1093/hmg/11.20.2463

  8. Davidor, Y.: Epistasis variance: a viewpoint on GA-hardness. Found. Gen. Algorithms 1, 23–35 (1991). https://doi.org/10.1016/B978-0-08-050684-5.50005-7

  9. Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml

  10. García-Dominguez, A., et al.: Feature selection using genetic algorithms for the generation of a recognition and classification of children activities model using environmental sound. Mob. Inf. Syst. 2020, 12 p (2020). 8617430. https://doi.org/10.1155/2020/8617430

  11. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)

    Google Scholar 

  12. Hussein, F., Kharma, N., Ward, R.: Genetic algorithms for feature selection and weighting, a review and study. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 1240–1244 (2001). https://doi.org/10.1109/ICDAR.2001.953980

  13. Jafari, S., Kapitaniak, T., Rajagopal, K., Pham, V.-T., Alsaadi, F.E.: Effect of epistasis on the performance of genetic algorithms. J. Zhejiang Univ.-Sci. A 20(2), 109–116 (2018). https://doi.org/10.1631/jzus.A1800399

  14. Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: ECML (1994)

    Google Scholar 

  15. Korns, M.F.: Genetic programming symbolic classification: A study. In: Banzhaf, W., Olson, R.S., Tozier, W., Riolo, R. (eds.) Genetic Programming Theory and Practice XV, pp. 39–54. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-90512-9

    Chapter  Google Scholar 

  16. La Cava, W., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019). https://doi.org/10.1016/j.swevo.2018.03.015

    Article  Google Scholar 

  17. Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC 1997). pp. 537–540 (1997). https://doi.org/10.1109/ICEC.1997.592369

  18. Lavine, B.K., White, C.G.: Boosting the performance of genetic algorithms for variable selection in partial least squares spectral calibrations. Appl. Spectrosc. 71(9), 2092–2101 (2017)

    Article  Google Scholar 

  19. Lee, J., Kim, Y.H.: Epistasis-based basis estimation method for simplifying the problem space of an evolutionary search in binary representation. Complexity 2019, 2095167, 13 pages (2019)

    Google Scholar 

  20. Lehman, J., Stanley, K.O.: Exploiting open-endedness to solve problems through the search for novelty. In: Proceedings of the Eleventh International Conference on Artificial Life, Alife XI. MIT Press, Cambridge (2008)

    Google Scholar 

  21. Li, A.D., Xue, B., Zhang, M.: Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection. Inf. Sci. 523, 245–265 (2020). https://doi.org/10.1016/j.ins.2020.03.032

    Article  MathSciNet  Google Scholar 

  22. Mathias, K.E., Eshelman, L.J., Schaffer, J.D.: Niches in NK-landscapes. In: Martin, W.N., Spears, W.M. (eds.) Foundations of Genetic Algorithms, vol. 6, pp. 27–46. Morgan Kaufmann, San Francisco (2001). https://doi.org/10.1016/B978-155860734-7/50085-8

  23. Merz, P., Freisleben, B.: On the effectiveness of evolutionary search in high-dimensional NK-landscapes. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), pp. 741–745 (1998). https://doi.org/10.1109/ICEC.1998.700144

  24. Mo, H., Li, Z., Zhu, C.: A kind of epistasis-tunable test functions for genetic algorithms. Concurr. Comput. Pract. Exp. 33(8), e5030 (2021). https://doi.org/10.1002/cpe.5030

  25. Muñoz, L., Silva, S., Trujillo, L.: M3GP- multiclass classification with GP. In: EuroGP (2015)

    Google Scholar 

  26. Nazareth, D.L., Soofi, E.S., Zhao, H.: Visualizing attribute interdependencies using mutual information, hierarchical clustering, multidimensional scaling, and self-organizing maps. In: 2007 40th Annual Hawaii International Conference on System Sciences (HICSS 2007), pp. 53–53 (2007). https://doi.org/10.1109/HICSS.2007.608

  27. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  28. Pelikan, M., Sastry, K., Goldberg, D.E., Butz, M.V., Hauschild, M.: Performance of evolutionary algorithms on NK landscapes with nearest neighbor interactions and tunable overlap. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 851–858. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1569901.1570018

  29. Petinrin, O.O., Wong, K.C.: Protocol for epistasis detection with machine learning using GenEpi package. Methods Mol. Biol. 2212, 291–305 (2021)

    Google Scholar 

  30. Reeves, C.R., Wright, C.C.: Epistasis in genetic algorithms: an experimental design perspective. In: Proceedings of the 6th International Conference on Genetic Algorithms. pp. 217–224. Morgan Kaufmann Publishers Inc., San Francisco (1995)

    Google Scholar 

  31. Rochet, S.: Epistasis in genetic algorithms revisited. Infor. Sci. 102(1), 133–155 (1997). https://doi.org/10.1016/S0020-0255(97)00017-0

  32. Rodrigues, N.M., Batista, J.E., Silva, S.: Ensemble genetic programming. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds.) Genetic Programming, pp. 151–166. Springer, Cham (2020). https://doi.org/10.1007/978-3-319-30668-1

  33. Seo, K.-K.: Content-Based Image Retrieval by Combining Genetic Algorithm and Support Vector Machine. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 537–545. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74695-9_55

    Chapter  Google Scholar 

  34. Shik Shin, K., Lee, Y.J.: A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 23, 321–328 (2002)

    Google Scholar 

  35. Smith, M.G., Bull, L.: Feature construction and selection using genetic programming and a genetic algorithm. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) Genetic Programming, pp. 229–237. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-319-30668-1

  36. Sohn, A., Olson, R.S., Moore, J.H.: Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, pp. 489–496. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3071178.3071212

  37. Tinós, R., Whitley, D., Chicano, F.: Partition crossover for pseudo-Boolean optimization. In: Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII, FOGA 2015, pp. 137–149. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2725494.2725497

  38. Urbanowicz, R., Kiralis, J., Sinnott-Armstrong, N., et al.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5(16) (2012). https://doi.org/10.1186/1756-0381-5-16

  39. Urbanowicz, R.J., Kiralis, J., Sinnott-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5, 16–16 (2012)

    Google Scholar 

  40. Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: Introduction and review. J. Biomed. Inf. 85, 189–203 (2018). https://doi.org/10.1016/j.jbi.2018.07.014

  41. Vanneschi, L., Castelli, M., Manzoni, L.: The K landscapes: a tunably difficult benchmark for genetic programming. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2001576.2001773

  42. Wutzl, B., Leibnitz, K., Rattay, F., Kronbichler, M., Murata, M., Golaszewski, S.M.: Genetic algorithms for feature selection when classifying severe chronic disorders of consciousness. PLoS ONE 14(7), 1–16 (2019). https://doi.org/10.1371/journal.pone.0219683

  43. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016). https://doi.org/10.1109/TEVC.2015.2504420

  44. Zhang, S.: sonar.all-data (2018). https://www.kaggle.com/ypzhangsam/sonaralldata

Download references

Acknowledgment

This work was supported by FCT, Portugal, through funding of LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020); MAR2020 program via project MarCODE (MAR-01.03.01-FEAMP-0047); projects BINDER (PTDC/CCI-INF/29168/2017), AICE (DSAIPA/DS/0113/2019), OPTOX (PTDC/CTA-AMB/30056/2017) and GADgET (DSAIPA/DS/0022/2018). Nuno Rodrigues and João Batista were supported by PhD Grants 2021/05322/BD and SFRH/BD/143972/2019, respectively; William La Cava was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R00LM012926.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuno M. Rodrigues .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 4. Holm corrected p-values using Kruskal-Wallis for the regular classification problems.
Table 5. Holm corrected p-values using Kruskal-Wallis for the gametes problems.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodrigues, N.M., Batista, J.E., La Cava, W., Vanneschi, L., Silva, S. (2022). SLUG: Feature Selection Using Genetic Algorithms and Genetic Programming. In: Medvet, E., Pappa, G., Xue, B. (eds) Genetic Programming. EuroGP 2022. Lecture Notes in Computer Science, vol 13223. Springer, Cham. https://doi.org/10.1007/978-3-031-02056-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-02056-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-02055-1

  • Online ISBN: 978-3-031-02056-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics