Classification Algorithms for Big Data over distributed processing frameworks
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @PhdThesis{Segatori:thesis,
-
author = "Armando Segatori",
-
title = "Classification Algorithms for Big Data over
distributed processing frameworks",
-
school = "Pisa University",
-
year = "2016",
-
address = "Italy",
-
note = "Scuola di Dottorato in Ingegneria ``Leonardo da
Vinci''",
-
keywords = "genetic algorithms, genetic programming, MapReduce",
-
URL = "https://etd.adm.unipi.it/t/etd-05112016-155709/",
-
URL = "https://etd.adm.unipi.it/theses/available/etd-05112016-155709/unrestricted/Segatori_PhD_Thesis.pdf",
-
size = "169 pages",
-
abstract = "Classification problems have been widely studied in
the context of data mining and different approaches to
address these problems have been developed in the last
decades. Among them, associative classification and
decision trees have proved to be very effective and
have been successfully employed in several application
domains. Furthermore, some of these approaches have
integrated the fuzzy set theory with the objective of
dealing with uncertain and noise data. Unfortunately,
most of the approaches proposed up to now have been
designed for maximizing accuracy, often neglecting the
complexity both in terms of memory that execution
times. Thus, these approaches are generally not able to
handle adequately the so-called ``big data''. In this
Ph.D. thesis, we propose different solutions in a
distributed environment for generating accurate and
interpretable classification models for big data. In
particular, we focus on associative classification and
decision trees, integrating our solutions with fuzzy
set theory. Since the generation of such models
requires that continuous features are discretized, we
also propose a novel distributed discretization
approach based on information entropy. This approach
has been therefore extended with fuzzy logic for
generating fuzzy partitions. Finally, considering the
complexity of the models generated by previous
solutions, we propose a distributed evolutionary
approach for optimizing both accuracy and
interpretability of the classifiers. The proposed
algorithms are shaped according to the MapReduce
programming model and have been deployed on well-known
data processing frameworks, widely employed in research
as well as industrial contexts. The performance
evaluation has been carried out by using different big
data benchmarks and the results obtained by the
proposed approaches and by some state-of-the-art
distributed classification algorithms have been
extensively discussed in terms of accuracy, model
complexity, and computation time.",
-
notes = "is this GP?
embargoed",
- }
Genetic Programming entries for
Armando Segatori
Citations