Abstract
The use of genetic programming for probabilistic pattern matching is investigated. A stochastic regular expression language is used. The language features a statistically sound semantics, as well as a syntax that promotes efficient manipulation by genetic programming operators. An algorithm for efficient string recognition based on approaches in conventional regular language recognition is used. When attempting to recognize a particular test string, the recognition algorithm computes the probabilities of generating that string and all its prefixes with the given stochastic regular expression. To promote efficiency, intermediate computed probabilities that exceed a given cut-off value will pre-empt particular interpretation paths, and hence prune unconstructive interpretation. A few experiments in recognizing stochastic regular languages are discussed. Application of the technology in bioinformatics is in progress.
Similar content being viewed by others
References
D. Angluin, “Computational learning theory: survey and selected bibliography,” in Proceedings of the 24th Annual ACM Symposium on the Theory of Computing,ACMPress: NewYork, 1992, pp. 351–369.
K.S. Fu and T.L. Booth, “Grammatical inference: introduction and survey–Part I,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, no. 1, pp. 95–111, January 1975.
K.S. Fu and T.L. Booth, “Grammatical inference: introduction and survey–Part II,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, no. 4, pp. 409–423, July 1975.
Y. Sakakibara, “Recent advances of grammatical inference,” Theoretical Computer Science, vol. 185, pp. 15–45, 1997.
K.S. Fu and T. Huang, “Stochastic grammars and languages,” International Journal of Computer and Information Sciences, vol. 1, no. 2, pp. 135–170, 1972.
K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall: Englewood Cliffs, NJ, 1982.
K. Lari and S.J. Young, “The estimation of stochastic contextfree grammars using the Inside-Outside algorithm,” Computer Speech and Language, vol. 4, pp. 35–56, 1990.
E. Charniak, Statistical Language Learning, MIT Press: Cambridge, MA, 1993.
J.E. Hopcroft and J.D. Ullman, Introduction toAutomata Theory, Languages, and Computation, Addison Wesley: Reading, MA, 1979.
R.C. Carrasco and M.L. Forcada, “Inferring stochastic regular grammars with recurrent neural networks,” in Proceedings 3rd International Colloqium on Grammatical Inference (ICGI96), edited by I. Miclet and C. de la Huguera, Springer-Verlag: Berlin 1996, pp. 274–286. LNAI 1147.
R.C. Carrasco and J. Oncina, “Learning deterministic regular grammars from stochastic samples in polynomial time,” Technical Report DLSI-96–01, Universidad de Alicante, April 1998.
F.J. Maryanski and T.L. Booth, “Inference of finite-state probabilistic grammars,” IEEE Transactions on Computers, vol. C26, pp. 521–536, 1977.
A. van der Mude and A.Walker, “On the inference of stochastic regular grammars,” Information and Control, vol. 38, pp. 310–329, 1978.
L.R. Rabiner and B.H. Juang, “An introduction to Hidden markov models,” in IEEE ASSP Magazine, pp. 4–16, January 1986.
L.R. Rabiner, “Atutorial on Hidden markov models and selected applications in speech recognition,” in Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, February 1989.
V.K. Garg, R. Kumar, and S.I. Marcus, “Probabilistic language framework for stochastic discrete event systems,” Technical Report 96–18, Institute for Systems Research, April 1996. http://www.isr.umc.edu/.
H. Zhou and J.J. Grefenstette, “Induction of finite automata by genetic algorithms,” in Proc. 1986 IEEE Intl. Conference on Systems, Man, and Cybernetics, Atlanta, GA, IEEE Press: New York, 1986, pp. 170–174.
B.D. Dunay, F.E. Petry, and B.P. Buckles, “Regular language induction with genetic programming,” in Proc. 1st IEEE Conference on Evolutionary Computation, 1994, pp. 396–400.
P. Dupont, “Regular grammatical inference from positive and negative samples by genetic search: the GIG method,” in 2nd Intl. Coll. on Grammatical Inference and Applications, Springer-Verlag: Berlin, 1994, pp. 236–245.
M. Tomita, “Dynamic construction of finite automata from examples using hill-climbing,” in 4th Annual Conf. of the Cognitive Science Soc., 1982, pp. 105–108.
S. Brave, “Evolving deterministic finite automata using cellular encoding,” in Proc. Genetic Programming 1997, Stanford University, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann: Los Altos, CA, 1997, pp. 39–44.
T. Longshaw, “Evolutionary learning of large grammars,” in Proc. Genetic Programming 1997, Stanford University, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann, Los Altos, CA, 1997, pp. 406–409.
B. Svingen, “Learning regular languages using genetic programming,” in Proc. Genetic Programming 1998, edited by J.R. Koza et al., Morgan Kaufmann, Los Altos, CA, 1998, pp. 374–376.
P. Wyard, “Context free grammar induction using genetic algorithms,” in Proceedings 4th International Conference on Genetic Algorithms, 1991, pp. 514–518.
M.M. Lankhorst, “Grammatical inference with a genetic algorithm,” in Proceedings of the 1994 EUROSIM Conference on Massively Parallel Processing Applications and Development, 1994, pp. 423–430.
S. Lucas, “Structuring chromosomes for context-free grammar evolution,” in Proceedings 1st International Conference on Evolutionary Computation, IEEE Press: New York, 1994, pp. 130–135.
S. Sen and J. Janakiraman, “Learning to construct pushdown automata for accepting deterministic context-free languages,” Applications of Artificial Intelligence X: Knowledge-Based Systems, vol. 1707, pp. 207–213, 1992.
M.M. Lankhorst, “Agenetic algorithm for the induction of pushdown automata,” in 1995 IEEE Intl. Conference on Evolutionary Computation, IEEE Press: New York, 1995.
B.D. Dunay and F.E. Petry, “Solving complex problems with genetic algorithms,” in Proc. 6th Intl. Conf. on Genetic Algorithms, edited by L. Eshelman, Morgan Kaufmann: Los Altos, CA, 1995.
M. Schwehm and A. Ost, “Inference of stochastic regular grammars by massively parallel genetic algorithms,” in Proc. 6th Intl. Conf. on Genetic Algorithms, Morgan-Kaufmann: Los Altos, CA, 1995.
T.E. Kammeyer and R.K. Belew, “Stochastic context-free grammar induction with a genetic algorithm using local search,” in Foundations of Genetic Algorithms IV, edited by R.K. Belew and M. Vode, Morgan-Kaufmann: Los Altos, CA, 1997.
J. Stoy, Denotational Semantics, MIT Press: Cambridge, MA, 1977.
K. Subrahmaniam, A Primer in Probability, Marcel Dekker: New York, 1979.
M. Sipser, Introduction to the Theory of Computation,PWSPub. Co., 1996.
M. Hennessy, The Semantics of Programming Languages–An Elementary Introduction Using Structural Operational Semantics, John Wiley and Sons: New York, 1990.
W.F. Clocksin and C.S. Mellish, Programming in Prolog, 4th ed, Springer-Verlag: Berlin, 1994.
B.J. Ross, “Logic-based genetic programming with definite clause translation grammars,” in Proc. GECCO-99, edited by J.R. Koza et al., 1999.
P.A. Whigham, “Grammatically-based genetic programming,” in Proceedings Workshop on Genetic Programming: From Theory to Real-World Applications, edited by J.P. Rosca, 1995, pp. 31–41.
M.L. Wong and K.S. Leung, “Learning programs in different paradigms using genetic programming,” in Proceedings 4th Congress of the Italian Association for AI, 1995, pp. 353–364.
A. Geyer-Shulz, “The next 700 programming languages for genetic programming,” in Proc. Genetic Programming 1997, Stanford University, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann: Los Altos, CA, 1997, pp. 128–136.
H. Abramson and V. Dahl, Logic Grammars, Springer-Verlag, 1989.
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, 2nd edn. Cambridge University Press: Cambridge, 1992.
A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, “Approaches to the automatic discovery of patterns in biosequences,” Technical Report 113, Department of Informatics, University of Bergen, Norway, December 1995.
K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch, “The PROSITE database, its status in 1999,” Nucleic Acids Research, vol. 27, no. 1, pp. 215–219, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ross, B.J. Probabilistic Pattern Matching and the Evolution of Stochastic Regular Expressions. Applied Intelligence 13, 285–300 (2000). https://doi.org/10.1023/A:1026524328760
Issue Date:
DOI: https://doi.org/10.1023/A:1026524328760