Skip to main content

Generating Redundant Features with Unsupervised Multi-tree Genetic Programming

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10781))

Included in the following conference series:

Abstract

Recently, feature selection has become an increasingly important area of research due to the surge in high-dimensional datasets in all areas of modern life. A plethora of feature selection algorithms have been proposed, but it is difficult to truly analyse the quality of a given algorithm. Ideally, an algorithm would be evaluated by measuring how well it removes known bad features. Acquiring datasets with such features is inherently difficult, and so a common technique is to add synthetic bad features to an existing dataset. While adding noisy features is an easy task, it is very difficult to automatically add complex, redundant features. This work proposes one of the first approaches to generating redundant features, using a novel genetic programming approach. Initial experiments show that our proposed method can automatically create difficult, redundant features which have the potential to be used for creating high-quality feature selection benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-5689-3

    MATH  Google Scholar 

  2. Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, pp. 37–64 (2014)

    Google Scholar 

  3. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  4. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)

    Article  Google Scholar 

  5. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  6. Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)

    Article  MathSciNet  Google Scholar 

  7. Lizier, J.T.: JIDT: an information-theoretic toolkit for studying the dynamics of complex systems. Front. Rob. AI 1, 11 (2014)

    Google Scholar 

  8. Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Comput. 8(1), 3–15 (2016)

    Article  Google Scholar 

  9. Lensen, A., Xue, B., Zhang, M.: GPGC: genetic programming for automatic clustering using a flexible non-hyper-spherical graph-based approach. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 449–456. ACM (2017)

    Google Scholar 

  10. Muni, D.P., Pal, N.R., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man. Cybern. Part B 36(1), 106–117 (2006)

    Article  Google Scholar 

  11. Ahmed, S., Zhang, M., Peng, L., Xue, B.: Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2014, pp. 249–256. ACM, Vancouver (2014)

    Google Scholar 

  12. Zhang, Y., Zhang, M.: A multiple-output program tree structure in genetic programming. Technical report, Victoria University of Wellington, New Zealand (2004)

    Google Scholar 

  13. Lin, Y., Bhanu, B.: Evolutionary feature synthesis for object recognition. IEEE Trans. Syst. Man Cybern. Part C 35(2), 156–171 (2005)

    Article  Google Scholar 

  14. Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)

    Article  Google Scholar 

  15. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  16. Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)

    Article  Google Scholar 

  17. Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  18. Pudil, P., Novovicová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(10), 1119–1125 (1994)

    Article  Google Scholar 

Download references

Acknowledgement

The authors would like to thank Tony Butler-Yeoman for his help in developing the initial ideas, and suggestions throughout the development of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lensen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lensen, A., Xue, B., Zhang, M. (2018). Generating Redundant Features with Unsupervised Multi-tree Genetic Programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2018. Lecture Notes in Computer Science(), vol 10781. Springer, Cham. https://doi.org/10.1007/978-3-319-77553-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77553-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77552-4

  • Online ISBN: 978-3-319-77553-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics