Skip to main content

Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study

  • Conference paper
  • First Online:
Advances in Big Data (INNS 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 529))

Included in the following conference series:

Abstract

The amount of available data for data mining and knowledge discovery continue to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper present a review of sampling techniques already used with active GP learner and discuss their ability to improve the GP training from very big data sets. A method in each sampling strategy is implemented and applied on the KDD intrusion detection problem using very close parameters. Experimental results show that sampling methods outperforms results obtained with full dataset but some of them cannot be scaled to large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See UCI Machine Learning Repository at http://archive.ics.uci.edu/ml/.

References

  1. CGP: Cartesian gp website, http://www.cartesiangp.co.uk

  2. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)

    Google Scholar 

  3. Curry, R., Heywood, M.: Towards efficient training on large datasets for genetic programming. In: Tawfik, A.Y., Goodwin, S.D. (eds.) AI 2004. LNCS (LNAI), vol. 3060, pp. 161–174. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24840-8_12

    Chapter  Google Scholar 

  4. Curry, R., Lichodzijewski, P., Heywood, M.I.: Scaling genetic programming to large datasets using hierarchical dynamic subset selection. IEEE Trans. Syst. Man Cybern. Part B 37(4), 1065–1073 (2007)

    Google Scholar 

  5. Gathercole, C.: An Investigation of Supervised Learning in Genetic Programming. University of Edinburgh, Thesis (1998)

    Google Scholar 

  6. Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in Genetic Programming. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994). doi:10.1007/3-540-58484-6_275

    Chapter  Google Scholar 

  7. Hunt, R., Johnston, M., Browne, W., Zhang, M.: Sampling methods in genetic programming for classification with unbalanced data. In: Li, J. (ed.) AI 2010. LNCS (LNAI), vol. 6464, pp. 273–282. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17432-2_28

    Chapter  Google Scholar 

  8. Iba, H.: Bagging, boosting, and bloating in genetic programming. In: The 1st Annual Conference on Genetic and Evolutionary Computation, Proceedings of GECCO 1999, vol. 2, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  9. Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection. Stat. Comput. 4(2), 87–112 (1994)

    Article  Google Scholar 

  10. Lasarczyk, C., Dittrich, P., Banzhaf, W.: Dynamic subset selection based on a fitness case topology. Evol. Comput. 12(2), 223–242 (2004)

    Article  Google Scholar 

  11. Luke, S.: Ecj homepage. http://cs.gmu.edu/~eclab/projects/ecj/

  12. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000). doi:10.1007/978-3-540-46239-2_9

    Chapter  Google Scholar 

  13. Nordin, P., Banzhaf, W.: An on-line method to evolve behavior and to control a miniature robot in real time with genetic programming. Adaptive Behav. 5(2), 107–140 (1997)

    Article  Google Scholar 

  14. Teller, A., David, A.: Automatically choosing the number of fitness cases: the rational allocation of trials. In: Genetic Programming 1997: Proceedings of the Second Annual Conference, pp. 321–328. Morgan Kaufmann (1997)

    Google Scholar 

  15. UCI: Kdd cup (1999). http://kdd.ics.uci.edu/databases/kddcup99/

  16. Zhang, B.-T., Cho, D.-Y.: Genetic programming with active data selection. In: McKay, B., Yao, X., Newton, C.S., Kim, J.-H., Furuhashi, T. (eds.) SEAL 1998. LNCS (LNAI), vol. 1585, pp. 146–153. Springer, Heidelberg (1999). doi:10.1007/3-540-48873-1_20

    Chapter  Google Scholar 

  17. Zhang, B.T., Joung, J.G.: Genetic programming with incremental data inheritance. In: The Genetic and Evolutionary Computation Conference, Proceedings, vol. 2, pp. 1217–1224. Morgan Kaufmann, Orlando (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hmida Hmida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hmida, H., Hamida, S.B., Borgi, A., Rukoz, M. (2017). Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47898-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47897-5

  • Online ISBN: 978-3-319-47898-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics