Skip to main content

Evolving Text Classifiers with Genetic Programming

  • Conference paper
Genetic Programming (EuroGP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3003))

Included in the following conference series:

Abstract

We describe a method for using Genetic Programming (GP) to evolve document classifiers. GP’s create regular expression type specifications consisting of particular sequences and patterns of N-Grams (character strings) and acquire fitness by producing expressions, which match documents in a particular category but do not match documents in any other category. Libraries of N-Gram patterns have been evolved against sets of pre-categorised training documents and are used to discriminate between new texts. We describe a basic set of functions and terminals and provide results from a categorisation task using the 20 Newsgroup data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennet, K., Shawe-Taylor, J., Wu, D.: Enlarging the margins in perceptron decision trees. Machine Learning 41, 295–313 (2000)

    Article  Google Scholar 

  2. Salton, G., McGill, M.J.: An Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    Google Scholar 

  3. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  5. Cavnar, W., Trenkle, J.: N-Gram-Based Text Categorizatio. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994)

    Google Scholar 

  6. Pickens, J., Croft, W.B.: An Exploratory Analysis of Phrases in Text Retrieval. In: Proceedings of RIAO 2000 Conference, Paris (2000)

    Google Scholar 

  7. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  8. Bergström, A., Jaksetic, P., Nordin, P.: Enhancing Information Retrieval by Automatic Acquisition of Textual Relations Using Genetic Programming. In: Proceedings of the 2000 International Conference on Intelligent User Interfaces (IUI 2000), pp. 29–32. ACM Press, New York (2000)

    Chapter  Google Scholar 

  9. Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267, 843–848 (1995)

    Article  Google Scholar 

  10. Biskri, I., Delisle, S.: Text Classification and Multilinguism: Getting at Words via Ngrams of Characters. In: Proceedings of the 6th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), Orlando, Florida, USA, vol. V, pp. 110–115 (2002)

    Google Scholar 

  11. Tauritz, D.R., Kok, J.N., Sprinkhuizen-Kuyper, I.G.: Adaptive information filtering using evolutionary computation. Information Sciences, vol 122(2-4), 121–140 (2000)

    Article  MATH  Google Scholar 

  12. Langdon, W.B.: Natural Language Text Classification and Filtering with Trigrams and Evolutionary Classifiers. In: Whitley, D. (ed.) Late Breaking Papers at the 2000 Genetic and Evolutionary (2000) Computation Conference, Las Vegas, Nevada, USA, pp. 210–217 (2000)

    Google Scholar 

  13. Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 563–569. MIT Press, Cambridge (2001)

    Google Scholar 

  14. Ahonen-Myka, H.: Finding All Maximal Frequent Sequences in Text. In: Proceedings of the 16th International Conference in Machine Learning ICML 1999 (1999)

    Google Scholar 

  15. Lang, K.: Learning to filter netnews. In: Proc. of the 12th Int. Conf. on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  16. Schapire, R., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39 (2000)

    Google Scholar 

  17. Slonim, N., Tishby, N.: Agglomerative Information Bottleneck. In: Proc. of Neural Information Processing Systems (NIPS 1999), pp. 617–623 (1999)

    Google Scholar 

  18. Slonim, N., Tishby, N.: The Power of Word Clusters for Text Classification, 23rd European Colloquium on Information Retrieval Research (2001)

    Google Scholar 

  19. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Department of Computer Science, University of Glasgow (1979)

    Google Scholar 

  20. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hirsch, L., Saeedi, M., Hirsch, R. (2004). Evolving Text Classifiers with Genetic Programming. In: Keijzer, M., O’Reilly, UM., Lucas, S., Costa, E., Soule, T. (eds) Genetic Programming. EuroGP 2004. Lecture Notes in Computer Science, vol 3003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24650-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24650-3_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21346-8

  • Online ISBN: 978-3-540-24650-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics