Tran 2016 Genetic programming for feature construction and selection.pdf (482.23 kB)
Genetic programming for feature construction and selection in classification on high-dimensional data
journal contribution
posted on 2021-03-25, 21:03 authored by Binh Tran, Bing XueBing Xue, Mengjie ZhangMengjie ZhangClassification on high-dimensional data with
thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the
quality of the feature set. The problem can be addressed
by using feature selection to choose only informative
features or feature construction to create new high-level
features. Genetic programming (GP) using a tree-based
representation can be used for both feature construction and implicit feature selection. This work presents
a comprehensive study to investigate the use of GP for
feature construction and selection on high-dimensional
classification problems. Different combinations of the
constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used
to evaluate their performance. The results show that
the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or
even increase the classification accuracy in most cases.
The cases with overfitting occurred are analysed via
the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.
This is a post-peer-review, pre-copyedit version of an article published in 'Memetic Computing'. The final authenticated version is available online at: https://doi.org/10.1007/s12293-015-0173-y. The following terms of use apply: https://www.springer.com/gp/open-access/publication-policies/aam-terms-of-use.
thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the
quality of the feature set. The problem can be addressed
by using feature selection to choose only informative
features or feature construction to create new high-level
features. Genetic programming (GP) using a tree-based
representation can be used for both feature construction and implicit feature selection. This work presents
a comprehensive study to investigate the use of GP for
feature construction and selection on high-dimensional
classification problems. Different combinations of the
constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used
to evaluate their performance. The results show that
the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or
even increase the classification accuracy in most cases.
The cases with overfitting occurred are analysed via
the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.
This is a post-peer-review, pre-copyedit version of an article published in 'Memetic Computing'. The final authenticated version is available online at: https://doi.org/10.1007/s12293-015-0173-y. The following terms of use apply: https://www.springer.com/gp/open-access/publication-policies/aam-terms-of-use.
History
Preferred citation
Tran, B., Xue, B. & Zhang, M. (2016). Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, 8(1), 3-15. https://doi.org/10.1007/s12293-015-0173-yPublisher DOI
Journal title
Memetic ComputingVolume
8Issue
1Publication date
2016-03-01Pagination
3-15Publisher
Springer Science and Business Media LLCPublication status
PublishedContribution type
ArticleOnline publication date
2015-12-19ISSN
1865-9284eISSN
1865-9292Article number
1Language
enUsage metrics
Categories
Keywords
Genetic programmingFeature constructionFeature selectionClassificationHigh-dimensional dataScience & TechnologyTechnologyComputer Science, Artificial IntelligenceOperations Research & Management ScienceComputer ScienceCANCER-DIAGNOSISALGORITHMOPTIMIZATIONCLASSIFIERSMedical BiotechnologyArtificial Intelligence and Image ProcessingComputer Software
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC