Skip to main content

Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11320))

Abstract

Missing values are an unavoidable issue in many real-world datasets. Classification with incomplete data has to be addressed carefully because inadequate treatment often leads to a big classification error. Interval genetic programming (IGP) is an approach to directly use genetic programming to evolve an effective and efficient classifier for incomplete data. This paper proposes a method to improve IGP for classification with incomplete data by integrating IGP with ensemble learning to build a set of classifiers. Experimental results show that the integration of IGP and ensemble learning to evolve a set of classifiers for incomplete data can achieve better accuracy than IGP alone. The proposed method is also more accurate than other common methods for classification with incomplete data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Acuna, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation, pp. 639–647. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_60

    Chapter  Google Scholar 

  2. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2013)

    Google Scholar 

  3. Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)

    Article  Google Scholar 

  4. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40, 121–144 (2010)

    Article  Google Scholar 

  6. García-Laencina, P.J., Sancho-Gómez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)

    Article  Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  8. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  9. Koza, J.R.: Genetic Programming III: Darwinian Invention and Problem Solving, vol. 3 (1999)

    Google Scholar 

  10. Liu, Y., Brown, S.D.: Comparison of five iterative imputation methods for multivariate classification. Chemom. Intell. Lab. Syst. 120, 106–115 (2013)

    Article  Google Scholar 

  11. Luke, S., et al.: A Java-based evolutionary computation research system, March 2004. http://cs.gmu.edu/~eclab/projects/ecj

  12. Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16, 645–661 (2012)

    Article  Google Scholar 

  13. Opitz, D.W., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. (JAIR) 11, 169–198 (1999)

    Article  Google Scholar 

  14. Tran, C.T., Zhang, M., Andreae, P.: Directly evolving classifiers for missing data using genetic programming. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 5278–5285 (2016)

    Google Scholar 

  15. Tran, C.T., Zhang, M., Andreae, P., Xue, B., Bui, L.T.: An effective and efficient approach to classification with incomplete data. Knowl.-Based Syst. 154, 1–16 (2018)

    Article  Google Scholar 

  16. White, I.R., Royston, P., Wood, A.M.: Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30, 377–399 (2011)

    Article  MathSciNet  Google Scholar 

  17. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cao Truong Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, C.T., Zhang, M., Xue, B., Andreae, P. (2018). Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data. In: Mitrovic, T., Xue, B., Li, X. (eds) AI 2018: Advances in Artificial Intelligence. AI 2018. Lecture Notes in Computer Science(), vol 11320. Springer, Cham. https://doi.org/10.1007/978-3-030-03991-2_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03991-2_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03990-5

  • Online ISBN: 978-3-030-03991-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics