Active Learning of Regular Expressions for Entity Extraction
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Bartoli:2017:ieeeTC,
-
author = "Alberto Bartoli and Andrea {De Lorenzo} and
Eric Medvet and Fabiano Tarlao",
-
journal = "IEEE Transactions on Cybernetics",
-
title = "Active Learning of Regular Expressions for Entity
Extraction",
-
year = "2018",
-
volume = "48",
-
number = "3",
-
pages = "1067--1080",
-
month = mar,
-
keywords = "genetic algorithms, genetic programming, automatic
programming, evolutionary computation, inference
mechanisms, man machine systems, semisupervised
learning, text processing",
-
ISSN = "2168-2267",
-
DOI = "doi:10.1109/TCYB.2017.2680466",
-
size = "14 pages",
-
abstract = "We consider the automatic synthesis of an entity
extractor, in the form of a regular expression, from
examples of the desired extractions in an unstructured
text stream. This is a long-standing problem for which
many different approaches have been proposed, which all
require the preliminary construction of a large dataset
fully annotated by the user. we propose an active
learning approach aimed at minimizing the user
annotation effort: the user annotates only one desired
extraction and then merely answers extraction queries
generated by the system. During the learning process,
the system digs into the input text for selecting the
most appropriate extraction query to be submitted to
the user in order to improve the current extractor. We
construct candidate solutions with genetic programming
(GP) and select queries with a form of
querying-by-committee, i.e., based on a measure of
disagreement within the best candidate solutions. All
the components of our system are carefully tailored to
the peculiarities of active learning with GP and of
entity extraction from unstructured text. We evaluate
our proposal in depth, on a number of challenging
datasets and based on a realistic estimate of the user
effort involved in answering each single query. The
results demonstrate high accuracy with significant
savings in terms of computational effort, annotated
characters, and execution time over a state-of-the-art
baseline.",
-
notes = "Also known as \cite{7886274}",
- }
Genetic Programming entries for
Alberto Bartoli
Andrea De Lorenzo
Eric Medvet
Fabiano Tarlao
Citations