Using feature construction to avoid large feature spaces in text classification
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Mayfield:2010:gecco,
-
author = "Elijah Mayfield and Carolyn Penstein-Rose",
-
title = "Using feature construction to avoid large feature
spaces in text classification",
-
booktitle = "GECCO '10: Proceedings of the 12th annual conference
on Genetic and evolutionary computation",
-
year = "2010",
-
editor = "Juergen Branke and Martin Pelikan and Enrique Alba and
Dirk V. Arnold and Josh Bongard and
Anthony Brabazon and Juergen Branke and Martin V. Butz and
Jeff Clune and Myra Cohen and Kalyanmoy Deb and
Andries P Engelbrecht and Natalio Krasnogor and
Julian F. Miller and Michael O'Neill and Kumara Sastry and
Dirk Thierens and Jano {van Hemert} and Leonardo Vanneschi and
Carsten Witt",
-
isbn13 = "978-1-4503-0072-8",
-
pages = "1299--1306",
-
keywords = "genetic algorithms, genetic programming, NLP, Natural
Language Processing, Text analysis, SVM",
-
month = "7-11 " # jul,
-
organisation = "SIGEVO",
-
address = "Portland, Oregon, USA",
-
DOI = "doi:10.1145/1830483.1830714",
-
publisher = "ACM",
-
publisher_address = "New York, NY, USA",
-
abstract = "Feature space design is a critical part of machine
learning. This is an especially difficult challenge in
the field of text classification, where an arbitrary
number of features of varying complexity can be
extracted from documents as a preprocessing step. A
challenge for researchers has consistently been to
balance expressiveness of features with the size of the
corresponding feature space, due to issues with data
sparsity that arise as feature spaces grow larger.
Drawing on past successes with genetic programming in
similar problems outside of text classification, we
propose and implement a technique for constructing
complex features from simpler features, and adding
these more complex features into a combined feature
space which can then be used by more sophisticated
machine learning classifiers. Applying this technique
to a sentiment analysis problem, we show encouraging
improvement in classification accuracy, with a small
and constant increase in feature space size. We also
show that the features we generate carry far more
predictive power than any of the simple features they
contain.",
-
notes = "Also known as \cite{1830714} GECCO-2010 A joint
meeting of the nineteenth international conference on
genetic algorithms (ICGA-2010) and the fifteenth annual
genetic programming conference (GP-2010)",
- }
Genetic Programming entries for
Elijah Mayfield
Carolyn Penstein Rose
Citations