Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods

Yinle Zhou, Ali Kooshesh, John Talburt

Source Title: International Journal of Business Intelligence Research (IJBIR)3(1)

ISSN: 1947-3591|EISSN: 1947-3605|EISBN13: 9781466611078|DOI: 10.4018/jbir.2012010105

MLA

Zhou, Yinle, et al. "Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods." IJBIR vol.3, no.1 2012: pp.72-82. http://doi.org/10.4018/jbir.2012010105

APA

Zhou, Y., Kooshesh, A., & Talburt, J. (2012). Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods. International Journal of Business Intelligence Research (IJBIR), 3(1), 72-82. http://doi.org/10.4018/jbir.2012010105

Chicago

Zhou, Yinle, Ali Kooshesh, and John Talburt. "Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods," International Journal of Business Intelligence Research (IJBIR) 3, no.1: 72-82. http://doi.org/10.4018/jbir.2012010105

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

Entity-based data integration (EBDI) is a form of data integration in which information related to the same real-world entity is collected and merged from different sources. It often happens that not all of the sources will agree on one value for a common attribute. These cases are typically resolved by invoking a rule that will select one of the non-null values presented by the sources. One of the most commonly used selection rules is called the naïve selection operator that chooses the non-null value provided by the source with the highest overall accuracy for the attribute in question. However, the naïve selection operator will not always produce the most accurate result. This paper describes a method for automatically generating a selection operator using methods from genetic programming. It also presents the results from a series of experiments using synthetic data that indicate that this method will yield a more accurate selection operator than either the naïve or naïve-voting selection operators.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Optimizing the Accuracy of Entity-Based Data Integration of Multiple Data Sources Using Genetic Programming Methods

MLA

APA

Chicago

Export Reference

Abstract

Request Access