Automatic discovery of protein motifs using genetic programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.7954
- @InCollection{koza:1996:adpmECTA,
-
author = "John R. Koza and David Andre",
-
title = "Automatic discovery of protein motifs using genetic
programming",
-
booktitle = "Evolutionary Computation: Theory and Applications",
-
publisher = "World Scientific",
-
year = "1999",
-
editor = "Xin Yao",
-
chapter = "5",
-
pages = "171--197",
-
address = "Singapore",
-
keywords = "genetic algorithms, genetic programming, DEAD box,
SWISSPROT, PROSITE",
-
ISBN = "981-02-2306-4",
-
URL = "http://www.genetic-programming.com/jkpdf/ecta1999.pdf",
-
abstract = "Automated methods of machine learning may prove to be
useful in discovering biologically meaningful
information hidden in the rapidly growing databases of
DNA sequences and protein sequences. Genetic
programming is an extension of the genetic algorithm in
which a population of computer programs is bred, over a
series of generations, in order to solve a problem.
Genetic programming is capable of evolving complicated
problem-solving expressions of unspecified size and
shape. Moreover, when automatically defined functions
are added to genetic programming, genetic programming
becomes capable of efficiently capturing and exploiting
recurring sub-patterns. This chapter describes how
genetic programming with automatically defined
functions successfully evolved motifs for detecting the
D-E-A-D box family of proteins and for detecting the
manganese superoxide dismutase family. Both motifs were
evolved without prespecifying their length. Both
evolved motifs employed automatically defined functions
to capture the repeated use of common subexpressions.
When tested against the SWISS-PROT database of
proteins, the two genetically evolved consensus motifs
detect the two families either as well, or slightly
better than, the comparable human-written motifs found
in the PROSITE database.",
-
notes = "ECTA, two ADFs each has OR in function set (ie
combination of 2 alternative amino acids at this
point). Result producing branch has AND (ie two
adjacent (along backbone) amino-acids (or sets of
aacids)). Covariance fitness, formula in terms of
number true positives etc.
Jury method: 12 motives evolved by separate GP runs
combined into one by requiring unanimous jury decision.
(Combined by hand or automatically?)
Parallel GP system, 64 transputer nodes.",
- }
Genetic Programming entries for
John Koza
David Andre
Citations