Abstract
Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI’s GEO database to indicated the quality of individual HG-U133A probes. Low concordance indicates a poor probe. Regular expressions can be data mined by a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming written in gawk and using egrep. The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Thomas B.: Evolutionary Algorithms in Theory and Practice. OUP (1996)
Barrett, T., et al.: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Research 35, D760–D765 (2007)
Beyer, H.-G.: The Theory of Evolution Strategies. Springer, Heidelberg (2001)
Brameier, M., Krings, A., MacCallum, R.M.: NucPred predicting nuclear localization of proteins. Bioinformatics 23(9), 1159–1160 (2007)
Brameier, M., Wiufp, C.: Ab initio identification of human microRNAs based on structure motifs. BMC Bioinformatics 8, 478 (2007)
Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Yu, T. (ed.) GECCO-2007 workshop program, pp. 2643–2646. ACM Press, New York (2007)
Handstad, T., Hestnes, A.J.H., Saetrom, P.: Motif kernel generated by GP improves remote homology and fold detection. BMC Bioinformatics 8(23)
Koza, J.R.: Genetic Programming. MIT press, Cambridge (1992)
Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: WCCI, Hong Kong, June 1-6, 2008, pp. 4152–4157. IEEE, Los Alamitos (2008)
Langdon, W.B., Barrett, S.J.: GP in data mining for drug discovery. In: Ghosh, A., et al. (eds.) Evolutionary Computing in Data Mining, pp. 211–235 (2004)
Langdon, W.B., da Silva Camargo, R., Harrison, A.P.: Spatial defects in 5896 HG-U133A GeneChips. In: Dopazo, J., et al. (eds.) CAMDA 2007 (2007)
Langdon, W.B., Harrison, A.P.: A grammar based strongly typed genetic programming system for finding regular expression which predict affymetrix DNA probe performance. Technical report, CES-483, University of Essex, UK (2008)
Langdon, W.B., Upton, G.J.G., da Silva Camargo, R., Harrison, A.P.: A survey of spatial defects in Homo Sapiens Affymetrix GeneChips (submitted)
Langdon, W.B.: Genetic Programming and Data Structures. Kluwer, Dordrecht (1998)
Langdon, W.B., Banzhaf, W.: Repeated sequences in linear genetic programming genomes. Complex Systems 15(4), 285–306 (2005)
Langdon, W.B., Buxton, B.F.: Evolving receiver operating characteristics for data fusion. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 87–96. Springer, Heidelberg (2001)
McKay, R.I., Hoang, T.H., Essam, D.L., Nguyen, X.H.: Developmental evaluation in GP. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 280–289. Springer, Heidelberg (2006)
Montana, D.J.: Strongly typed GP. Evolutionary Computation 3(2), 199–230
Naef, F., Wijnen, H., Magnasco, M.: Reply to comment on solving the riddle of the bright mismatches. Physical Review E 73(6), 063902 (2006)
Nikolaev, N.I., Slavov, V.: Concepts of inductive genetic programming. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 49–60. Springer, Heidelberg (1998)
O’Neill, M., Ryan, C.: Grammatical evolution. IEEE TEC 5(4), 349–358 (2001)
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (With contributions by J. R. Koza) (2008), http://www.gp-field-guide.org.uk
Radcliff, N.J.: Genetic set recombination. In: FOGA 2, pp. 203–219. Morgan Kaufmann, San Francisco
Ross, B.J.: The evaluation of a stochastic regular motif language for protein sequences. In: Spector, L., et al. (eds.) GECCO 2001, pp. 120–128 (2001)
Upton, G.J., Langdon, W.B., Harrison, A.P.: Incorrect measurement of gene expression by microarrays (submitted)
Whigham, P.A.: Search bias, language bias, and genetic programming. In: Koza, J.R., et al. (eds.) Genetic Programming 1996, pp. 230–237. MIT Press, Cambridge (1996)
Whigham, P.A., Crapper, P.F.: Time series modelling using GP: In rainfall-runoff models. In: Spector, L., et al. (eds.) AiGP3, pp. 89–104. MIT Press, Cambridge (1999)
Wong, M.L., Leung, K.S.: Evolving recursive functions for the even-parity problem using genetic programming. In: AiGP 2, pp. 221–240. MIT Press, Cambridge (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Langdon, W.B., Harrison, A.P. (2008). Evolving Regular Expressions for GeneChip Probe Performance Prediction. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds) Parallel Problem Solving from Nature – PPSN X. PPSN 2008. Lecture Notes in Computer Science, vol 5199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87700-4_105
Download citation
DOI: https://doi.org/10.1007/978-3-540-87700-4_105
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87699-1
Online ISBN: 978-3-540-87700-4
eBook Packages: Computer ScienceComputer Science (R0)