Skip to main content

Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering

  • Conference paper
Book cover Genetic Programming (EuroGP 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4971))

Included in the following conference series:

Abstract

In this paper we apply multiobjective genetic programming to the cost-sensitive classification task of labelling spam e-mails. We consider three publicly-available spam corpora and make comparison with both support vector machines and naïve Bayes classifiers, both of which are held to perform well on the spam filtering problem. We find that for the high cost ratios of practical interest, our cost-sensitive multiobjective genetic programming gives the best results across a range of performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpaydin, E.: Combined 5 ×2 cv F-test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)

    Article  Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive Bayesian anti-spam filtering. In: Proc. Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, pp. 9–17 (2000)

    Google Scholar 

  3. Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial e-mail. NCSR Demokritos Technical Report No.2004/2 (2004)

    Google Scholar 

  4. Clack, C., Farrington, J., Lidwell, P., Yu, T.: Autonomous document classification for business. In: Proc. ACM Conf. AGENTS 1997, pp. 201–208 (1997)

    Google Scholar 

  5. Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. on Neural Networks 10, 1048–1054 (1999)

    Article  Google Scholar 

  6. Ekárt, A., Németh, S.Z.: Selection based on the Pareto nondomination criterion for controlling code growth in genetic programming. Genetic Programming & Evolvable Machines 2, 61–73 (2001)

    Article  MATH  Google Scholar 

  7. Fawcett, T.: In vivo spam filtering: A challenge problem for data mining. KDD Explorations 5, 140–148 (2003)

    Article  Google Scholar 

  8. Fonseca, C.M., Fleming, P.J.: Multi-objective optimization and multiple constraints handling with evolutionary algorithms. Part 1: A unified formulation. IEEE Trans. Syst., Man & Cybern. 28, 26–37 (1998)

    Article  Google Scholar 

  9. Hidalgo, J.G.: Evaluating cost-sensitive unsolicited bulk e-mail categorization. In: Proc. 17th ACM Symposium on Appl. Computing, pp. 615–620 (2002)

    Google Scholar 

  10. Hirsch, L., Saeedi, M., Hirsch, R.: Evolving rules for document classification. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J.I., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 85–95. Springer, Heidelberg (2005)

    Google Scholar 

  11. Ito, T., Iba, H., Sato, S.: Non-destructive depth-dependent crossover for genetic programming. In: 1st European Workshop on Genetic Programming, pp. 14–15 (1998)

    Google Scholar 

  12. Katirai, H.: Filtering junk e-mail: A performance comparison between genetic programming and naïve Bayes (1999), available at, http://members.rogers.com/hoomank/katirai99filtering.pdf

  13. Kolcz, A., Alspector, J.: SVM-based filtering of e-mail spam with content-specific misclassification costs. In: IEEE Int. Conf. on Data Mining,TextDM 2001 Workshop on Text Mining (2001)

    Google Scholar 

  14. Kumar, R., Rockett, P.: Improved sampling of the Pareto-front in multi-objective genetic optimization by steady-state evolution: A Pareto converging genetic algorithm. Evolutionary Computation 10, 283–314 (2002)

    Article  Google Scholar 

  15. Li, J., Li, X., Yao, X.: Cost-sensitive classification with genetic programming. Congress on Evolutionary Computation 3, 2114–2121 (2005)

    Article  Google Scholar 

  16. Li, H., Niranjan, M.: Discriminant subspaces of some high dimensional pattern recognition problems. In: IEEE Workshop on Machine Learning for Signal Processing, Thessaloniki (August 2007)

    Google Scholar 

  17. Lochart, A.: Quoted in Koprowski, G. J., Spam accounts for most e-mail traffic, Tech News World (2006), http://www.technewsworld.com/story/51055.html

  18. Porter, M.: An algorithm for suffix stripping. Automated Library and Information Systems 4, 130–137 (1980)

    Article  Google Scholar 

  19. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  20. Tretyakov, K.: Machine learning techniques in spam filtering. In: Data Mining Problem-oriented Seminar. MTAT.03., vol. 177, pp. 60–79 (2004)

    Google Scholar 

  21. Yang, Y., Pedersen, J.O.: A comparative study of feature selection in text categorization. In: Proc. 14th Int. Conf. on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  22. Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3, 243–269 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael O’Neill Leonardo Vanneschi Steven Gustafson Anna Isabel Esparcia Alcázar Ivanoe De Falco Antonio Della Cioppa Ernesto Tarantino

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Y., Li, H., Niranjan, M., Rockett, P. (2008). Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering. In: O’Neill, M., et al. Genetic Programming. EuroGP 2008. Lecture Notes in Computer Science, vol 4971. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78671-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78671-9_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78670-2

  • Online ISBN: 978-3-540-78671-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics