Aggressive and Effective Feature Selection using Genetic Programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Sandin:2012:CEC,
-
title = "Aggressive and Effective Feature Selection using
Genetic Programming",
-
author = "Isac Sandin and Guilherme Andrade and
Felipe Viegas and Daniel Madeira and Leonardo Rocha and
Thiago Salles and Marcos Andre Goncalves",
-
pages = "2718--2725",
-
booktitle = "Proceedings of the 2012 IEEE Congress on Evolutionary
Computation",
-
year = "2012",
-
editor = "Xiaodong Li",
-
month = "10-15 " # jun,
-
DOI = "doi:10.1109/CEC.2012.6252878",
-
address = "Brisbane, Australia",
-
ISBN = "0-7803-8515-2",
-
keywords = "genetic algorithms, genetic programming, Data mining,
Learning classifier systems",
-
abstract = "One of the major challenges in automatic
classification is to deal with highly dimensional data.
Several dimensionality reduction strategies, including
popular feature selection metrics such as Information
Gain and Chi-squared, have already been proposed to
deal with this situation. However, these strategies are
not well suited when the data is very skewed, a common
situation in real-world data sets. This occurs when the
number of samples in one class is much larger than the
others, causing common feature selection metrics to be
biased towards the features observed in the largest
class. In this paper, we propose the use of Genetic
Programming (GP) to implement an aggressive, yet very
effective, selection of attributes. Our GP-based
strategy is able to largely reduce dimensionality,
while dealing effectively with skewed data. To this
end, we exploit some of the most common feature
selection metrics and, with GP, combine their results
into new sets of features, obtaining a better unbiased
estimate for the discriminative power of each feature.
Our proposal was evaluated against each individual
feature selection metric used in our GP-based solution
(namely, Information Gain, Chi-squared, Odds-Ratio,
Correlation Coefficient) using a k8 cancer-rescue
mutants data set, a very unbalanced collection
referring to examples of p53 protein. For this data
set, our solution not only increases the efficiency of
the learning algorithms, with an aggressive reduction
of the input space, but also significantly increases
its accuracy.",
-
notes = "WCCI 2012. CEC 2012 - A joint meeting of the IEEE, the
EPS and the IET.",
- }
Genetic Programming entries for
Isac Sandin
Guilherme Andrade
Felipe Viegas
Daniel Madeira
Leonardo Rocha
Thiago Cunha de Moura Salles
Marcos Andre Goncalves
Citations