abstract = "Affymetrix High Density Oligonuclotide Arrays (HDONA)
simultaneously measure expression of thousands of genes
using millions of probes. We use correlations between
measurements for the same gene across 6685 human tissue
samples from NCBI's GEO database to indicated the
quality of individual HG-U133A probes. Low concordance
indicates a poor probe. Regular expressions can be data
mined by a Backus-Naur form (BNF) context-free grammar
using strongly typed genetic programming written in
gawk and using egrep. The automatically produced motif
is better at predicting poor DNA sequences than an
existing human generated RE, suggesting runs of
Cytosine and Guanine and mixtures should all be
avoided.
Code is available
ftp://ftp.cs.ucl.ac.uk/genetic/gp-code/re_gp.tar",