Finding Relevant Attributes in High Dimensional Data: A Distributed Computing Hybrid Data Mining Strategy
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InCollection{DBLP:journals/trs/ValdesB07,
-
author = "Julio J. Valdes and Alan J. Barton",
-
title = "Finding Relevant Attributes in High Dimensional Data:
A Distributed Computing Hybrid Data Mining Strategy",
-
year = "2007",
-
booktitle = "Transactions on Rough Sets VI",
-
publisher = "Springer",
-
volume = "4374",
-
series = "Lecture Notes in Computer Science",
-
keywords = "genetic algorithms, genetic programming",
-
pages = "366--396",
-
DOI = "doi:10.1007/978-3-540-71200-8_20",
-
bibsource = "DBLP, http://dblp.uni-trier.de",
-
isbn13 = "978-3-540-71198-8",
-
abstract = "In many domains the data objects are described in
terms of a large number of features (e.g. microarray
experiments, or spectral characterizations of organic
and inorganic samples). A pipelined approach using two
clustering algorithms in combination with Rough Sets is
investigated for the purpose of discovering important
combinations of attributes in high dimensional data.
The Leader and several k-means algorithms are used as
fast procedures for attribute set simplification of the
information systems presented to the rough sets
algorithms. The data described in terms of these fewer
features are then discretized with respect to the
decision attribute according to different rough set
based schemes. From them, the reducts and their derived
rules are extracted, which are applied to test data in
order to evaluate the resulting classification accuracy
in crossvalidation experiments. The data mining process
is implemented within a high throughput distributed
computing environment. Nonlinear transformation of
attribute subsets preserving the similarity structure
of the data were also investigated. Their
classification ability, and that of subsets of
attributes obtained after the mining process were
described in terms of analytic functions obtained by
genetic programming (gene expression programming), and
simplified using computer algebra systems. Visual data
mining techniques using virtual reality were used for
inspecting results. An exploration of this approach
(using Leukemia, Colon cancer and Breast cancer gene
expression data) was conducted in a series of
experiments. They led to small subsets of genes with
high discrimination power.",
- }
Genetic Programming entries for
Julio J Valdes
Alan J Barton
Citations