Automatic Discovery Using Genetic Programming of an Unknown-Sized Detector of Protein Motifs Containing Repeatedly-used Subexpressions
Created by W.Langdon from
gp-bibliography.bib Revision:1.8129
- @InProceedings{koza:1995:protien,
-
author = "John R. Koza and David Andre",
-
title = "Automatic Discovery Using Genetic Programming of an
Unknown-Sized Detector of Protein Motifs Containing
Repeatedly-used Subexpressions",
-
booktitle = "Proceedings of the Workshop on Genetic Programming:
From Theory to Real-World Applications",
-
year = "1995",
-
editor = "Justinian P. Rosca",
-
pages = "89--97",
-
address = "Tahoe City, California, USA",
-
month = "9 " # jul,
-
keywords = "genetic algorithms, genetic programming",
-
URL = "http://www.genetic-programming.com/jkpdf/ml1995motif.pdf",
-
size = "9 pages",
-
abstract = "Automated methods of machine learning may be useful in
discovering biologically meaningful patterns that are
hidden in the rapidly growing databases of genomic and
protein sequences. However, almost all existing methods
of automated discovery require that the user specify,
in advance, the size and shape of the pattern that is
to be discovered. Moreover, existing methods do not
have a workable analog of the idea of a reusable
subroutine to exploit the recurring sub-patterns of a
problem environment. Genetic programming can evolve
complicated problem-solving expressions of unspecified
size and shape. When automatically defined functions
are added to genetic programming, genetic programming
becomes capable of efficiently capturing and exploiting
recurring sub-patterns. This paper describes how
genetic programming with automatically defined
functions successfully evolved motifs for detecting the
D-E-A-D box family of proteins and for detecting the
manganese superoxide dismutase family. Both motifs were
evolved without prespecifying their length. Both
evolved motifs employed automatically defined functions
to capture the repeated use of common subexpressions.
When tested against the SWISS-PROT database of
proteins, the two genetically evolved consensus motifs
detect the two families either as well, or slightly
better than, the comparable human-written motifs found
in the PROSITE database.",
-
notes = "GP successfully evolved code for detecting the D-E-A-D
box family of protiens which worked as well or better
than human written code",
-
notes = "part of \cite{rosca:1995:ml}",
- }
Genetic Programming entries for
John Koza
David Andre
Citations