Abstract
Known as the GIGO (Garbage In, Garbage Out) principle, the quality of the input data highly influences or even determines the quality of the output of any machine learning, big data and data mining algorithm. The input data which is often represented by a set of features may suffer from many issues. Feature manipulation is an effective means to improve the feature set quality, but it is a challenging task. Evolutionary computation (EC) techniques have shown advantages and achieved good performance in feature manipulation. This paper reviews recent advances on EC based feature manipulation methods in classifcation, clustering, regression, incomplete data, and image analysis, to provide the community the state-of-the-art work in the field.
- H. Al-Sahaf, A. Al-Sahaf, B. Xue, M. Johnston, and M. Zhang. 2017. Automatically Evolving Rotation-Invariant Texture Image Descriptors by Genetic Programming. IEEE Transactions on Evolutionary Computation 21, 1 (2017), 83--101. Google ScholarDigital Library
- Salem Alelyani, Jiliang Tang, and Huan Liu. 2013. Feature Selection for Clustering: A Review. In Data Clustering: Algorithms and Applications. 29--60.Google Scholar
- Haider Banka and Suresh Dara. 2015. A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognition Letters 52 (2015), 94--100. Google ScholarDigital Library
- Neven Boric and Pablo A. Estevez. 2007. Genetic programming-based clustering using an information theoretic fitness measure. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC). 31--38.Google Scholar
- K. Y. Chan, T. S. Dillon, and C. K. Kwong. 2011. Modeling of a Liquid Epoxy Molding Process Using a Particle Swarm Optimization-Based Fuzzy Regression Approach. IEEE Transactions on Industrial Informatics 7, 1 (2011), 148--158.Google ScholarCross Ref
- Qi Chen, Mengjie Zhang, and Bing Xue. 2016. Genetic Programming with Embedded Feature Construction for High-Dimensional Symbolic Regression. In the 20th Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES). Springer, 87--102.Google Scholar
- Qi Chen, Mengjie Zhang, and Bing Xue. 2017. Feature Selection to Improve Generalisation of Genetic Programming for High-Dimensional Symbolic Regression. IEEE Transactions on Evolutionary Computation 99, 1 (2017), to appear.Google Scholar
- Beatriz de la Iglesia. 2013. Evolutionary computation for feature selection in classification problems. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3, 6 (2013), 381--407.Google ScholarDigital Library
- Agoston E Eiben and Jim Smith. 2015. From evolutionary computation to the evolution of things. Nature 521, 7553 (2015), 476--482.Google Scholar
- Rana Forsati, Alireza Moayedikia, Richard Jensen, Mehrnoush Shamsfard, and Mohammad Reza Meybodi. 2014. Enriched ant colony optimization and its application in feature selection. Neurocomputing 142 (2014), 354--371. Google ScholarDigital Library
- W. Fu, M. Johnston, and M. Zhang. 2014. Low-Level Feature Extraction for Edge Detection Using Genetic Programming. IEEE Transactions on Cybernetics 44, 8 (2014), 1459--1472.Google ScholarCross Ref
- Wenlong Fu, Mark Johnston, and Mengjie Zhang. 2015. Distribution-based invariant feature construction using genetic programming for edge detection. Soft Computing 19, 8 (2015), 2371--2389. Google ScholarDigital Library
- Min Han and Weijie Ren. 2015. Global mutual information-based feature selection approach using single-objective and multi-objective optimization. Neurocomputing 168 (2015), 47--54. Google ScholarDigital Library
- M. Iqbal, B. Xue, H. Al-Sahaf, and M. Zhang. 2017. Cross-Domain Reuse of Extracted Knowledge in Genetic Programming for Image Classi cation. IEEE Transactions on Evolutionary Computation 99 (2017).Google Scholar
- Dervis Karaboga, Celal Ozturk, Nurhan Karaboga, and Beyza Gorkemli. 2012. Artificial bee colony programming for symbolic regression. Information Sciences 209 (2012), 1 -- 15. Google ScholarDigital Library
- Ahmed Kattan, Michael Kampouridis, and Alexandros Agapitos. 2014. Generalisation Enhancement via Input Space Transformation: A GP Approach. Springer Berlin Heidelberg, 61--74. Google ScholarDigital Library
- Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97 (1997), 273--324. Google ScholarDigital Library
- John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
- Riccardo Leardi and Amparo Lupiez Gonzlezb. 1998. Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemometrics and Intelligent Laboratory Systems 41, 2 (1998), 195 -- 207.Google ScholarCross Ref
- Jaesung Lee and Dae-Won Kim. 2015. Memetic feature selection algorithm for multi-label classification. Information Sciences 293 (2015), 80 -- 96.Google ScholarCross Ref
- Andrew Lensen, Harith Al-Sahaf, Mengjie Zhang, and Bing Xue. 2016. Genetic Programming for Region Detection, Feature Extraction, Feature Construction and Classification in Image Data. In European Conference on Genetic Programming. Vol. 9594. Springer International Publishing, 51--67.Google Scholar
- Andrew Lensen, Bing Xue, and Mengjie Zhang. 2016. Particle swarm optimisation representations for simultaneous clustering and feature selection. In IEEE Symposium Series on Computational Intelligence (SSCI). 1--8.Google ScholarCross Ref
- Andrew Lensen, Bing Xue, and Mengjie Zhang. 2017. Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering. In Proceeding of the 20th European Conference on the Applications of Evolutionary Computation. Springer, to appear.Google ScholarCross Ref
- Huan Liu, Hiroshi Motoda, Rudy Setiono, and Zheng Zhao. 2010. Feature Selection: An Ever Evolving Frontier in Data Mining. In Feature Selection for Data Mining (JMLR Proceedings), Vol. 10. JMLR.org, 4--13.Google Scholar
- Huan Liu and Lei Yu. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17, 4 (2005), 491--502. Google ScholarDigital Library
- Huan Liu and Zheng Zhao. 2009. Manipulating Data and Dimension Reduction Methods: Feature Selection. In Encyclopedia of Complexity and Systems Science. Springer, 5348--5359.Google Scholar
- L. Liu, L. Shao, X. Li, and K. Lu. 2016. Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach. IEEE Transactions on Cybernetics 46, 1 (2016), 158--170.Google ScholarCross Ref
- Trent McConaghy. 2010. Latent variable symbolic regression for high-dimensional inputs. Springer.Google Scholar
- K. Nag and N.R. Pal. 2016. A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classi cation. IEEE Transactions on Cybernetics 46 (2016), 499--510.Google ScholarCross Ref
- Enrique Naredo and Leonardo Trujillo. 2013. Searching for novel clustering programs. In Genetic and Evolutionary Computation Conference (GECCO). 1093-- 1100. Google ScholarDigital Library
- Bach Hoai Nguyen, Bing Xue, and Peter Andreae. 2016. A Novel Binary Particle Swarm Optimization Algorithm and Its Applications on Knapsack and Feature Selection Problems. In the 20th Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES). Springer, 319--332.Google Scholar
- Hoai Bach Nguyen, Bing Xue, and Peter Andreae. 2016. Mutual information for feature selection: estimation or counting? Evolutionary Intelligence 9, 3 (2016), 95--110. Conference on the Applications of Evolutionary Computation. Springer International Publishing, to appear.Google ScholarCross Ref
- Hoai Bach Nguyen, Bing Xue, and Peter Andreae. 2017. Surrogate-model based Particle Swarm Optimisation with Local Search for Feature Selection in Classification. In Proceeding of the 21th European Conference on the Applications of Evolutionary Computation. Springer International Publishing, to appear.Google ScholarCross Ref
- Hoai Bach Nguyen, Bing Xue, Ivy Liu, Peter Andreae, and Mengjie Zhang. 2016. New mechanism for archive maintenance in PSO-based multi-objective feature selection. Soft Computing (2016), 1--20. Google ScholarDigital Library
- Stjepan Oreski and Goran Oreski. 2014. Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Systems with Applications 41, 4, Part 2 (2014), 2052 -- 2064. Google ScholarDigital Library
- Wenbin Qian and Wenhao Shu. 2015. Mutual information criterion for feature selection from incomplete data. Neurocomputing 168 (2015), 210--220. Google ScholarDigital Library
- Conor Ryan, Jeannie Fitzgerald, Krzysztof Krawiec, and David Medernach. 2015. Image Classification with Genetic Programming: Building a Stage 1 Computer Aided Detector for Breast Cancer. Springer International Publishing, 245--287.Google Scholar
- Weiguo Sheng, Xiaohui Liu, and Mike Fairhurst. 2008. A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 868--879. Google ScholarDigital Library
- M. Suganuma, D. Tsuchiya, S. Shirakawa, and T. Nagao. 2016. Hierarchical feature construction for image classification using Genetic Programming. In IEEE International Conference on Systems, Man, and Cybernetics (SMC). 1423-- 1428.Google Scholar
- Binh Tran, Bing Xue, and Mengjie Zhang. 2015. Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing 8, 1 (2015), 3--15.Google ScholarCross Ref
- Binh Tran, Mengjie Zhang, and Bing Xue. 2016. Multiple feature construction in classification on highdimensional data using GP. In IEEE Symposium Series on Computational Intelligence (SSCI). 1--8.Google Scholar
- Binh Ngan Tran, Bing Xue, and Mengjie Zhang. 2017. Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data. Springer International Publishing, to appear.Google Scholar
- Cao Truong Tran, Mengjie Zhang, and Peter Andreae. 2016. A Genetic Programming-Based Imputation Method for Classification with Missing Data. Springer International Publishing, 149--163.Google Scholar
- Cao Truong Tran, Mengjie Zhang, Peter Andreae, and Bing Xue. 2016. Directly Constructing Multiple Features for Classification with Missing Data using Genetic Programming with Interval Functions. In Genetic and Evolutionary Computation Conference (GECCO). Google ScholarDigital Library
- Cao Truong Tran, Mengjie Zhang, Peter Andreae, and Bing Xue. 2016. Improving performance for classification with incomplete data using wrapper-based feature selection. Evolutionary Intelligence 9, 3 (2016), 81--94.Google ScholarCross Ref
- Cao Truong Tran, Mengjie Zhang, Peter Andreae, and Bing Xue. 2017. Bagging and Feature Selection for Classification with Incomplete Data. In Proceeding of the 20th European Conference on the Applications of Evolutionary Computation. Springer, to appear.Google ScholarCross Ref
- Jorge R. Vergara and Pablo A. Estevez. 2014. A review of feature selection methods based on mutual information. Neural Computing and Applications 24, 1 (2014), 175--186.Google ScholarCross Ref
- Jiaheng Wang, Bing Xue, Xiaoying Gao, and Mengjie Zhang. 2016. A Di erential Evolution Approach to Feature Selection and Instance Selection. Springer International Publishing, 588--602.Google Scholar
- Chih-Hung Wu, Gwo-Hshiung Tzeng, and Rong-Ho Lin. 2009. A Novel hybrid genetic algorithm for kernel function and parameter optimization in support vector regression. Expert Systems with Applications 36, 3, Part 1 (2009), 4725 -- 4735. Google ScholarDigital Library
- Bing Xue and Mengjie Zhang. 2016. Evolutionary computation for feature manipulation: Key challenges and future directions. In 2016 IEEE Congress on Evolutionary Computation (CEC). 3061--3067.Google ScholarCross Ref
- Bing Xue, Mengjie Zhang, Will N. Browne, and Xin Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.Google ScholarDigital Library
- Yiteng Zhai, Yew-Soon Ong, and I.W. Tsang. 2014. The Emerging "Big Dimensionality". IEEE Computational Intelligence Magazine 9, 3 (2014), 14--26. Google ScholarDigital Library
Index Terms
- Evolutionary feature manipulation in data mining/big data
Recommendations
Mining Big Data
ICEIS 2015: Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1Nowadays, the daily amount of generated data is measured in exabytes. Such huge data is now referred to as Big Data. Big data mining leads to the discovery of the useful information from huge data repositories. However, this huge amount of data hinders ...
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
Comments