Abstract
k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
GarcĂa, A.J., GĂłmez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Tseng, L.Y., Yang, S.B.: A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recogn. 33(7), 1251–1259 (2000)
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer Science & Business Media, Heidelberg (1998)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)
Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press, Cambridge (1992)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2015)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)
Nanda, S.J., Panda, G.: A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol. Comput. 16, 1–18 (2014)
Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, pp. 226–231 (1996)
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4–6), 175–181 (2000)
Boric, N., Estévez, P.A.: Genetic programming-based clustering using an information theoretic fitness measure. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)
Ahn, C.W., Oh, S., Oh, M.: A genetic programming approach to data clustering. In: Kim, T., Adeli, H., Grosky, W.I., Pissinou, N., Shih, T.K., Rothwell, E.J., Kang, B.-H., Shin, S.-J. (eds.) MulGraB 2011. CCIS, vol. 263, pp. 123–132. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27186-1_15
Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Lichman, M.: UCI machine learning repository (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lensen, A., Xue, B., Zhang, M. (2017). New Representations in Genetic Programming for Feature Construction in k-Means Clustering. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-68759-9_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68758-2
Online ISBN: 978-3-319-68759-9
eBook Packages: Computer ScienceComputer Science (R0)