Skip to main content
Log in

Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. Online available at http://lod-cloud.net.

  2. https://github.com/owlcs/owlapi.

  3. https://github.com/Simmetrics/simmetrics.

References

  1. Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5:1–22

    Google Scholar 

  2. Schmachtenberg M, Bizer C, Paulheim H (2014) Adoption of the linked data best practices in different topical domains. In: International semantic web conference. pp 245–260

    Chapter  Google Scholar 

  3. Koza J, Poli R (2005) Genetic programming. MIT Press, Cambridge

    Google Scholar 

  4. Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Silk-A link discovery framework for the web of data. Linked data web WWW

  5. Ngonga Ngomo A-C, Auer S, Ngomo A, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. pp 2312–2317

  6. Demartini G, Difallah D (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM Press, Cambridge, pp 469–478

  7. Tejada S, Knoblock CCA, Minton S (2001) Learning object identification rules for information integration. Inf Syst 26:607–633

    Article  Google Scholar 

  8. Elfeky M, Verykios V (2002) TAILOR: a record linkage toolbox. In: 18th international conference on data engineering

  9. Bilenko M, Mooney RRJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2003). ACM Press, New York, pp 39–48

  10. Bilenko M, View M, Mooney RJ (2006) Adaptive blocking : learning to scale up record linkage. In: IEEE International conference on data mining. pp 87–96

  11. Isele R, Bizer C (2011) Learning linkage rules using genetic programming. In: Proceedings of the 6th international conference on ontology matching. pp 13–24

  12. Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. J Web Semant 23:2–15

    Article  Google Scholar 

  13. Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K, Ngomo A, Lyko K, Ngonga Ngomo A-CC, Lyko K (2012) EAGLE: efficient active learning of link specifications using genetic programming. In: Extended semantic web conference. pp 149–163

  14. Singh A, Sharan A (2018) Genetic-fuzzy programming based linkage rule miner (GFPLR-Miner) for entity linking in semantic web. Int J Semant Web Inf Syst 14:134–166

    Article  Google Scholar 

  15. Singh A, Sharan A (2017) Adaptive genetic programming based linkage rule miner for entity linking in Semantic Web. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 373–378

  16. Sherif MA, Ngonga Ngomo A-C, Lehmann J (2017) Wombat—a generalization approach for automatic link discovery. In: European semantic web conference. Springer, Cham, pp 103–119

    Chapter  Google Scholar 

  17. Lyko K, Lehmann J, Ngomo A-CN, Hassan M (2016) Induction of link specifications using refinement operators. In: Sack H, Blomqvist E, d’Aquin M, Ghidini C, Ponzetto SP, Lange C (eds) 13th International conference, ESWC 2016. Springer, Heraklion, Crete, Greece

  18. Palumbo E, Rizzo G, Troncy R (2018) STEM: stacked threshold-based entity matching for knowledge base generation. Semant Web 10:117–137

    Article  Google Scholar 

  19. Hu W, Chen J, Qu Y (2011) A self-training approach for resolving object conference on the semantic web. In: Proceedings of the 20th international conference on World wide web—WWW’11. ACM Press, New York, p 87

  20. Kejriwal M, Miranker DDP (2015) Semi-supervised instance matching using boosted classifiers. In: European semantic web conference. pp 388–402

    Chapter  Google Scholar 

  21. Ngomo A, Lehmann J, Auer S (2011) Raven-active learning of link specifications. In: Proceedings of the 6th international conference on semantic web. pp 25–36

  22. Araujo S, Tran DTD, De Vries AP, Schwabe D, de Vries A (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27:1397–1440

    Article  Google Scholar 

  23. Li J, Tang J, Li Y, Luo Q (2009) RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21:1218–1232

    Article  Google Scholar 

  24. Niu X, Rong S, Zhang Y, Wang H (2011) Zhishi. links results for OAEI 2011. In: CEUR workshop proceedings

  25. Saïs F, Niraula N, Pernelle N, Rousset MC (2010) LN2R—a knowledge based reference reconciliation system: OAEI 2010 results. In: CEUR workshop proceedings. pp 172–179

  26. Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: International conference on parallel problem solving from nature. Springer, Berlin, pp 411–421

    Chapter  Google Scholar 

  27. Luke S, Panait L (2006) A comparison of bloat control methods for genetic programming. Evol Comput 14:309–344

    Article  Google Scholar 

  28. Kejriwal M, Miranker DDP (2015) An unsupervised instance matcher for schema-free RDF data. Web Semant Sci Serv Agents World Wide Web 35:102–123

    Article  Google Scholar 

  29. Ramadan B, Christen P (2015) Unsupervised blocking key selection for real-time entity resolution. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 574–585

    Chapter  Google Scholar 

  30. Christen P (2008) Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD 08. ACM Press, New York, pp 1065–1068

  31. Obraczka D (2017) Active learning of link specifications using decision tree learning. https://pdfs.semanticscholar.org/4c58/9b2949e0accfb54a84bfac45567e452b99d3.pdf

  32. de Carvalho M, Laender AAHF, De Carvalho G, Laender AAHF, Andre M, Silva AS (2012) A genetic programming approach to record deduplication. IEEE Trans Knowl Data Eng 24:399–412

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, A., Sharan, A. Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web. Evol. Intel. 12, 609–632 (2019). https://doi.org/10.1007/s12065-019-00263-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00263-0

Keywords

Navigation