A Detection of Duplicate Records from Multiple Web Databases using pattern matching in UDD
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Bharambe:2013:ijetae,
-
author = "Dewendra Bharambe and Susheel Jain and Anurag Jain",
-
title = "A Detection of Duplicate Records from Multiple Web
Databases using pattern matching in UDD",
-
journal = "International Journal of Emerging Technology and
Advanced Engineering",
-
year = "2013",
-
volume = "3",
-
number = "5",
-
pages = "412--417",
-
month = may,
-
keywords = "genetic algorithms, genetic programming, data
deduplication, UDD, SVM, WCSS, genetic algorithm,
pattern matching",
-
ISSN = "2250--2459",
-
annote = "The Pennsylvania State University CiteSeerX Archives",
-
bibsource = "OAI-PMH server at citeseerx.ist.psu.edu",
-
language = "en",
-
oai = "oai:CiteSeerX.psu:10.1.1.413.7928",
-
rights = "Metadata may be used without restrictions as long as
the oai identifier remains attached to it.",
-
URL = "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.413.7928",
-
URL = "http://www.ijetae.com/files/Volume3Issue5/IJETAE_0513_68.pdf",
-
URL = "http://www.ijetae.com/Volume3Issue5.html",
-
abstract = "Record matching refers to the task of finding entries
that refer to the same entity in two or more files, is
a vital process in data integration. Most of the
supervised record matching methods require training
data provided by users. Such methods can not apply for
web database scenario, where query results dynamically
generated. In existing system, an unsupervised record
matching method effectively identifies the duplicates
from query result records of multiple web databases by
identifying the duplicate and non duplicate set in the
source and from that non duplicate set again searches
for the existence of duplication. Then use two
co-operative classifiers from the non duplicate set,
they are Weighted Component Similarity Summing (WCSS)
Classifier and Support Vector Machine (SVM) classifier.
These two classifiers can be used to identify the query
results iteratively from multiple web databases. In
this paper we modify record matching algorithm with
genetic algorithm. The genetic programming is time
consuming so we proposed UDD with genetic programming.
A performance evaluation for accuracy is done for the
dataset with duplicates using UDD and UDD with Genetic
algorithm.",
-
notes = "Article 68.",
- }
Genetic Programming entries for
Dewendra Onkar Bharambe
Susheel Jain
Anurag Jain
Citations