A Genetic Programming Slant on the Way to Record De-Duplication in Repositories
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Preethy:2013:IJIET,
-
author = "S. Preethy and A. Daniel Das",
-
title = "A Genetic Programming Slant on the Way to Record
De-Duplication in Repositories",
-
journal = "International Journal of Innovations in Engineering
and Technology (IJIET)",
-
year = "2013",
-
volume = "2",
-
number = "2",
-
pages = "60--64",
-
month = apr,
-
keywords = "genetic algorithms, genetic programming,
de-duplication, computation",
-
annote = "The Pennsylvania State University CiteSeerX Archives",
-
bibsource = "OAI-PMH server at citeseerx.ist.psu.edu",
-
language = "en",
-
oai = "oai:CiteSeerX.psu:10.1.1.305.1403",
-
rights = "Metadata may be used without restrictions as long as
the oai identifier remains attached to it.",
-
URL = "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.305.1403",
-
URL = "http://ijiet.com/wp-content/uploads/2013/05/10.pdf",
-
ISSN = "2319-1058",
-
URL = "http://ijiet.com/issues/volume-2-issue-2-april-2013/",
-
size = "5 pages",
-
abstract = "Several systems that rely on consistent data to offer
high-quality services, such as digital libraries and
e-commerce brokers, may be affected by the existence of
duplicates, quasi replicas, or near-duplicate entries
in their repositories. Because of that, there have been
significant investments from private and government
organisations for developing methods for removing
replicas from its data repositories. This is due to the
fact that clean and replica-free repositories not only
allow the retrieval of higher quality information but
also lead to more concise data and to potential savings
in computational time and resources to process this
data. In this paper, we propose a genetic programming
approach to record de- duplication that combines
several different pieces of evidence extracted from the
data content to find a de-duplication function that is
able to identify whether two entries in a repository
are replicas or not. As shown by our experiments, our
approach outperforms an existing state-of-the-art
method found in the literature. Moreover, the suggested
functions are computationally less demanding since they
use fewer evidence. In addition, our genetic
programming approach is capable of automatically
adapting these functions to a given fixed replica
identification boundary, freeing the user from the
burden of having to choose and tune this parameter.",
-
notes = "Department of Information Technology N.P.R.College of
Engineering and Technology, Dindigul, Tamilnadu,
India
Department of Mechanical Engineering N.P.R.College of
Engineering and Technology, Dindigul, Tamilnadu,
India",
- }
Genetic Programming entries for
S Preethy
A Daniel Das
Citations