Concise Pattern Learning for RDF Data Sets Interlinking
Created by W.Langdon from
gp-bibliography.bib Revision:1.8081
- @PhdThesis{Zhengjie_Fan:thesis,
-
title = "Concise Pattern Learning for {RDF} Data Sets
Interlinking",
-
titletranslation = "Apprentissage de Motifs Concis pour le Liage de
Donnees RDF",
-
author = "Zhengjie Fan",
-
year = "2014",
-
school = "Universite de Grenoble",
-
address = "France",
-
month = "7 " # aug,
-
keywords = "genetic algorithms, genetic programming, interlinking,
ontology matching, machine learning",
-
annote = "Computer mediated exchange of structured knowledge
(EXMO) ; Inria Grenoble - Rh{\^o}ne-Alpes ; INRIA -
INRIA - Laboratoire d'Informatique de Grenoble (LIG) ;
CNRS - Universit{\'e} Pierre Mend{\`e}s France
(Grenoble 2 UPMF) - Institut National Polytechnique de
Grenoble (INPG) - Universit{\'e} Joseph Fourier
(Grenoble 1 UJF) - CNRS - Universit{\'e} Pierre
Mend{\`e}s France (Grenoble 2 UPMF) - Institut National
Polytechnique de Grenoble (INPG) - Universit{\'e}
Joseph Fourier (Grenoble 1 UJF); Universit{\'e} de
Grenoble; J{\'e}r{\^o}me
Euzenat(jerome.euzenat@inria.fr)",
-
bibsource = "OAI-PMH server at api.archives-ouvertes.fr",
-
contributor = "Computer mediated exchange of structured knowledge and
J{\'e}r{\^o}me Euzenat and Datalift",
-
identifier = "tel-00986104",
-
language = "english",
-
oai = "oai:HAL:tel-00986104v1",
-
rights = "info:eu-repo/semantics/openAccess",
-
type = "info:eu-repo/semantics/doctoralThesis; Theses",
-
URL = "https://tel.archives-ouvertes.fr/tel-00986104",
-
URL = "https://tel.archives-ouvertes.fr/tel-00986104/document",
-
URL = "https://tel.archives-ouvertes.fr/tel-00986104/file/Thesis.pdf",
-
size = "169 pages",
-
abstract = "There are many data sets being published on the web
with Semantic Web technology. The data sets contain
analogous data which represent the same resources in
the world. If these data sets are linked together by
correctly building links, users can conveniently query
data through a uniform interface, as if they are
querying one data set. However, finding correct links
is very challenging because there are many instances to
compare. Many existing solutions have been proposed for
this problem. (1) One straight-forward idea is to
compare the attribute values of instances for
identifying links, yet it is impossible to compare all
possible pairs of attribute values. (2) Another common
strategy is to compare instances according to attribute
correspondences found by instance-based ontology
matching, which can generate attribute correspondences
based on instances. However, it is hard to identify the
same instances across data sets because there are the
same instances whose attribute values of some attribute
correspondences are not equal. (3) Many existing
solutions leverage Genetic Programming to construct
interlinking patterns for comparing instances, while
they suffer from long running time. In this thesis, an
interlinking method is proposed to interlink the same
instances across different data sets, based on both
statistical learning and symbolic learning. The input
is two data sets, class correspondences across the two
data sets and a set of sample links that are assessed
by users as either positive or negative. The method
builds a classifier that distinguishes correct links
and incorrect links across two RDF data sets with the
set of assessed sample links. The classifier is
composed of attribute correspondences across
corresponding classes of two data sets, which help
compare instances and build links. The classifier is
called an interlinking pattern in this thesis. On the
one hand, our method discovers potential attribute
correspondences of each class correspondence via a
statistical learning method, the K-medoids clustering
algorithm, with instance value statistics. On the other
hand, our solution builds the interlinking pattern by a
symbolic learning method, Version Space, with all
discovered potential attribute correspondences and the
set of assessed sample links. Our method can fulfill
the interlinking task that does not have a conjunctive
interlinking pattern that covers all assessed correct
links with a concise format. Experiments confirm that
our interlinking method with only 1percent of sample
links already reaches a high F-measure (around
0.94-0.99). The F-measure quickly converges, being
improved by nearly 10percent than other approaches.",
- }
Genetic Programming entries for
Zhengjie Fan
Citations