abstract = "Application domains with large attribute spaces, such
as genomics and text analysis,necessitate clustering
algorithms with more sophistication than traditional
clustering algorithms. More sophisticated approaches
are required to cope with the large dimensionality and
cardinality of these data sets. Subspace clustering, a
generalisation of traditional clustering, identifies
the attribute support for each cluster as well as the
location and number of clusters. In the most general
case, attributes associated with each cluster could be
unique. The proposed algorithm, Symbiotic Evolutionary
Sub-space Clustering (S-ESC) borrows from symbiosis in
the sense that each clustering solution is defined in
terms of a host (a single member of the host
population) and a number of coevolved cluster centroids
(or symbionts in an independent symbiont population).
Symbionts define clusters and therefore attribute
subspaces, whereas hosts define sets of clusters to
constitute a non-degenerate solution. The symbiotic
representation of S-ESC is the key to making it
scalable to high-dimensional data sets, while an
integrated subsampling process makes it scalable to
tasks with a large number of data items. A bi-objective
evolutionary method is proposed to identify the unique
attribute support of each cluster while detecting its
data instances. Benchmarking is performed against a
well known test suite of subspace clustering data sets
with four well-known comparator algorithms from both
the full dimensional and subspace clustering
literature: EM, MINECLUS, PROCLUS, STATPC and a generic
genetic algorithm-based subspace clustering.
Performance of the S-ESC algorithm was found to be
robust across a wide cross-section of properties with a
common parameterisation used throughout. This was not
the case for the comparator algorithms. Specifically,
performance could be sensitive to a particular data
distribution or parametersweeps might be necessary to
provide comparable performance. A comparison is also
made relative to a non-symbiotic genetic algorithm. In
this case each individual represents the set of
clusters comprising a subspace cluster solution.
Benchmarking indicates that the proposed symbiotic
framework can be demonstrated to be superior once
again. The S-ESC code and data sets are publicly
available.",