Record Deduplication By Evolutionary Means
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Misc{oai:CiteSeerX.psu:10.1.1.604.610,
-
author = "Marco Modesto and Moises G. {de Carvalho} and
Walter {dos Santos}",
-
title = "Record Deduplication By Evolutionary Means",
-
howpublished = "CiteSeerX",
-
year = "2002?",
-
address = "Departamento de Ciencia da Computacao, Universidade
Federal de Minas Gerais, Belo Horizonte, MG, Brazil",
-
keywords = "genetic algorithms, genetic programming",
-
annote = "The Pennsylvania State University CiteSeerX Archives",
-
bibsource = "OAI-PMH server at citeseerx.ist.psu.edu",
-
language = "en",
-
oai = "oai:CiteSeerX.psu:10.1.1.604.610",
-
rights = "Metadata may be used without restrictions as long as
the oai identifier remains attached to it.",
-
URL = "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.604.610",
-
URL = "http://homepages.dcc.ufmg.br/~nivio/cursos/pa06/seminarios/seminario16/seminario16.pdf",
-
size = "5 pages",
-
abstract = "Identifying record replicas in digital data
repositories is a key step to improve the quality of
content and services available, as well as to yield
eventual sharing efforts. Several deduplication
strategies are available, but most of them rely on
manually chosen settings to combine evidence used to
identify records as being replicas. In this work, we
present the results of experiments we have carried out
with a Machine Learning approach for the deduplication
problem. Our approach is based on Genetic Programming
(GP), that is able to automatically generate similarity
functions to identify record replicas in a given
repository. The generated similarity functions properly
combine and weight the best evidence available among
the record fields in order to tell when two distinct
records represent the same real-world entity. On a
previous work, fixed similarity functions were
associated to each evidence. On the present work, the
GP will be also used to choose the best evidence and
similarity functions associations. The results of the
experiments show that our approach outperforms the
baseline method by Fellegi and Sunter. It also
outperformed the previous GP results, using fixed
evidence associations when identifying replicas in a
data set containing researcher's personal data.",
-
notes = "Oct 2017 reference by
http://homepages.dcc.ufmg.br/~nivio/cursos/pa06/seminarios/",
- }
Genetic Programming entries for
Marco Modesto
Moises G de Carvalho
Walter dos Santos
Citations