booktitle = "International Conference on Information Communication
and Embedded Systems (ICICES 2014)",
title = "An evolutional approach for record deduplication and
improving accuracy level in large repositories",
year = "2014",
month = feb,
abstract = "This paper deals how genetic programming can be used
for record de-duplication. Many systems rely on the
data integrity for offering high quality services that
may be affected by the existence of near-replicas,
quasi-replicas, or replicas entries in their
repositories. So, there has been a huge effort from
private and public organisations for developing
effective methods for removing duplicates from large
data repositories. It is because of clear,
duplicate-free repositories not only allow the
retrieval of higher-quality data but also lead to a
more concise data representation and for efficient
savings in execution time and resources to proceed the
data. In this paper, we improve the results of a
GP-based approach for record de-duplication by doing a
comprehensive set of experiments regarding its
parametrisation setup. Our experiments show that some
parameter like efficiency, accuracy can improve the
results to up 30percent. Thus, the obtained results can
be used as preventive way for suggesting the best way
to set up our GP-based approach parameters to record
de-duplication.",