SOAP: Semantic outliers automatic preprocessing
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{TRUJILLO:2020:IS,
-
author = "Leonardo Trujillo and Uriel Lopez and
Pierrick Legrand",
-
title = "{SOAP:} Semantic outliers automatic preprocessing",
-
journal = "Information Sciences",
-
volume = "526",
-
pages = "86--101",
-
year = "2020",
-
ISSN = "0020-0255",
-
DOI = "doi:10.1016/j.ins.2020.03.071",
-
URL = "http://www.sciencedirect.com/science/article/pii/S0020025520302516",
-
keywords = "genetic algorithms, genetic programming, Outliers,
Semantics, Robust regression",
-
abstract = "Genetic Programming (GP) is an evolutionary algorithm
for the automatic generation of symbolic models
expressed as syntax trees. GP has been successfully
applied in many domain, but most research in this area
has not considered the presence of outliers in the
training set. Outliers make supervised learning
problems difficult, and sometimes impossible, to solve.
For instance, robust regression methods cannot handle
more than 50percent of outlier contamination, referred
to as their breakdown point. This paper studies
problems where outlier contamination is high, reaching
up to 90percent contamination levels, extreme cases
that can appear in some domains. This work shows, for
the first time, that a random population of GP
individuals can detect outliers in the output variable.
From this property, a new filtering algorithm is
proposed called Semantic Outlier Automatic
Preprocessing (SOAP), which can be used with any
learning algorithm to differentiate between inliers and
outliers. Since the method uses a GP population, the
algorithm can be carried out for free in a GP symbolic
regression system. The approach is the only method that
can perform such an automatic cleaning of a dataset
without incurring an exponential cost as the percentage
of outliers in the dataset increases",
- }
Genetic Programming entries for
Leonardo Trujillo
Uriel Lopez
Pierrick Legrand
Citations