Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Unpublished{katirai99,
-
author = "Hooman Katirai",
-
title = "Filtering Junk E-Mail: A Performance Comparison
between Genetic Programming and Naive Bayes",
-
year = "1999",
-
month = "10 " # sep,
-
note = "4A Year student project",
-
URL = "http://www.mit.edu/~hooman/papers/katirai99filtering.pdf",
-
URL = "http://citeseer.nj.nec.com/katirai99filtering.html",
-
URL = "http://citeseer.ist.psu.edu/310632.html",
-
keywords = "genetic algorithms, genetic programming, digital
communications, spam, UBE",
-
abstract = "This paper describes the application of genetic
programming as a novel approach to the problem of
filtering junk e-mail. We benchmark our results against
the common standard: the naive Bayes classifier. While
the genetically programmed classifier demonstrated a
precision comparable to that of naive Bayes, it was
slightly outperformed in recall. Since both learning
methods gave similar results, it is recommended that a
larger study be undertaken to ascertain whether these
differences are indeed statistically significant.
Further it is recommended that the performance of these
classifiers be tested in a richer feature space more
typical of real-world classifiers. Although the
genetically programming classifier greatly outperformed
the naive Bayes classifier in speed, it is concluded
that a more efficient implementation of naive Bayes
needs to be used in order to provide a fair comparison.
We show that when left unabated, e-mail signatures also
known as taglines reduce the value of several important
features in junk e-mail detection; however it is also
shown that these e-mail signatures may be harvested as
advantageous features if some of their components are
removed and noted as a feature. We therefore recommend
that a better parser capable of meeting this criteria
be implemented. To aid the reader in the theoretical
aspects of our work, we have included introductory
background for both approaches, including a full
derivation of the generative naive Bayes model.",
-
size = "27 pages",
- }
Genetic Programming entries for
Hooman Katirai
Citations