Analysis of Grammatical Evolution Approaches to Regular Expression Induction
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Gonzalez-Pardo:2011:AoGEAtREI,
-
title = "Analysis of Grammatical Evolution Approaches to
Regular Expression Induction",
-
author = "Antonio Gonzalez-Pardo and David Camacho",
-
pages = "632--639",
-
booktitle = "Proceedings of the 2011 IEEE Congress on Evolutionary
Computation",
-
year = "2011",
-
editor = "Alice E. Smith",
-
month = "5-8 " # jun,
-
address = "New Orleans, USA",
-
organization = "IEEE Computational Intelligence Society",
-
publisher = "IEEE Press",
-
ISBN = "0-7803-8515-2",
-
keywords = "genetic algorithms, genetic programming, grammatical
evolution, Data mining",
-
DOI = "doi:10.1109/CEC.2011.5949679",
-
abstract = "Regular expressions, or regexes, have been used
traditionally as a pattern matching tool to search for
structures in a set of objects, like files, text
documents or folders. Pattern matching can be used to
look for files whose name contains a given string, to
search files that contain a specific pattern within
them, or simply to extract text in a set of documents.
It is very popular to apply regexes to detect and
extract patterns that represent phone numbers, URLs,
email addresses, etc. These kind of information can be
characterised because it has a well defined structure.
Nevertheless, regexes are not very frequently used
because its high complexity in both, syntax and
grammatical rules, makes regexes difficult to
understand. For this reason, the development of
programs able to automatically generate, and evaluate,
regexes has become a valuable task. This work analyses
the performance of different grammatical evolutionary
approaches in the generation of regexes able to extract
URL patterns. Four different types of grammars have
been evaluated: a context-free grammar, a context-free
grammar with a penalised fitness function, an
extensible context-free grammar, and a Christiansen
grammar. For the considered problem, the experimental
results show that the best performance of the system,
measured as cumulative success rate, is achieved using
Christiansen grammars.",
-
notes = "CEC2011 sponsored by the IEEE Computational
Intelligence Society, and previously sponsored by the
EPS and the IET.",
- }
Genetic Programming entries for
Antonio Gonzalez-Pardo
David Camacho
Citations