Symbiotic evolutionary subspace clustering (S-ESC)

author = "Ali R. Vahdat",

title = "Symbiotic evolutionary subspace clustering {(S-ESC)}",

school = "Dalhousie University",

year = "2012",

address = "Halifax, Nova Scotia, Canada",

month = nov,

keywords = "genetic algorithms, genetic programming",

URL = "

https://web.cs.dal.ca/~mheywood/Thesis/PhD.html",

URL = "

http://hdl.handle.net/10222/40629",

URL = "

https://dalspace.library.dal.ca/bitstream/handle/10222/40629/Vahdat-Ali-PhD-CS-Nov2013.pdf",

size = "174 pages",

abstract = "Application domains with large attribute spaces, such as genomics and text analysis, necessitate clustering algorithms with more sophistication than traditional clustering algorithms. More sophisticated approaches are required to cope with the large dimensionality and cardinality of these data sets. Subspace clustering, a generalisation of traditional clustering, identifies the attribute support for each cluster as well as the location and number of clusters. In the most general case, attributes associated with each cluster could be unique. The proposed algorithm, Symbiotic Evolutionary Sub-space Clustering (S-ESC) borrows from symbiosis in the sense that each clustering solution is defined in terms of a host (a single member of the host population) and a number of coevolved cluster centroids (or symbionts in an independent symbiont population). Symbionts define clusters and therefore attribute subspaces, whereas hosts define sets of clusters to constitute a non-degenerate solution. The symbiotic representation of S-ESC is the key to making it scalable to high-dimensional data sets, while an integrated subsampling process makes it scalable to tasks with a large number of data items. A bi-objective evolutionary method is proposed to identify the unique attribute support of each cluster while detecting its data instances. Benchmarking is performed against a well known test suite of subspace clustering data sets with four well-known comparator algorithms from both the full dimensional and subspace clustering literature: EM, MINECLUS, PROCLUS, STATPC and a generic genetic algorithm-based subspace clustering. Performance of the S-ESC algorithm was found to be robust across a wide cross-section of properties with a common parameterisation used throughout. This was not the case for the comparator algorithms. Specifically, performance could be sensitive to a particular data distribution or parametersweeps might be necessary to provide comparable performance. A comparison is also made relative to a non-symbiotic genetic algorithm. In this case each individual represents the set of clusters comprising a subspace cluster solution. Benchmarking indicates that the proposed symbiotic framework can be demonstrated to be superior once again. The S-ESC code and data sets are publicly available.",

notes = "Is this GP?

Supervisor: Malcolm I. Heywood",