Skip to main content

New Representations in Genetic Programming for Feature Construction in k-Means Clustering

  • Conference paper
  • First Online:
Simulated Evolution and Learning (SEAL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10593))

Included in the following conference series:

Abstract

k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. García, A.J., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)

    Article  Google Scholar 

  3. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  4. Tseng, L.Y., Yang, S.B.: A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recogn. 33(7), 1251–1259 (2000)

    Article  Google Scholar 

  5. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer Science & Business Media, Heidelberg (1998)

    Book  MATH  Google Scholar 

  6. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)

    Article  Google Scholar 

  7. Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press, Cambridge (1992)

    MATH  Google Scholar 

  8. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2015)

    Book  MATH  Google Scholar 

  9. Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)

    Article  Google Scholar 

  10. Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)

    Article  Google Scholar 

  11. Nanda, S.J., Panda, G.: A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol. Comput. 16, 1–18 (2014)

    Article  Google Scholar 

  12. Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  13. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, pp. 226–231 (1996)

    Google Scholar 

  14. Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4–6), 175–181 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  15. Boric, N., Estévez, P.A.: Genetic programming-based clustering using an information theoretic fitness measure. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)

    Google Scholar 

  16. Ahn, C.W., Oh, S., Oh, M.: A genetic programming approach to data clustering. In: Kim, T., Adeli, H., Grosky, W.I., Pissinou, N., Shih, T.K., Rothwell, E.J., Kang, B.-H., Shin, S.-J. (eds.) MulGraB 2011. CCIS, vol. 263, pp. 123–132. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27186-1_15

    Chapter  Google Scholar 

  17. Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)

    Article  Google Scholar 

  18. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lensen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lensen, A., Xue, B., Zhang, M. (2017). New Representations in Genetic Programming for Feature Construction in k-Means Clustering. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68759-9_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68758-2

  • Online ISBN: 978-3-319-68759-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics