Estimating the Credibility of Examples in Automatic Document Classification
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Palotti:2010:JIDM,
-
author = "Joao R. M. Palotti and Thiago Salles and
Gisele L. Pappa and Filipe {de Lima Arcanjo} and
Marcos Andre Goncalves and Wagner {Meira. Jr.}",
-
title = "Estimating the Credibility of Examples in Automatic
Document Classification",
-
journal = "Journal of Information and Data Management",
-
year = "2010",
-
volume = "1",
-
number = "3",
-
pages = "439--454",
-
month = oct,
-
keywords = "genetic algorithms, genetic programming, credibility,
automatic document classification",
-
URL = "http://seer.lcc.ufmg.br/index.php/jidm/article/view/85",
-
biburl = "http://dblp.uni-trier.de/rec/bib/journals/jidm/PalottiSPAGM10",
-
annote = "The Pennsylvania State University CiteSeerX Archives",
-
language = "en",
-
oai = "oai:CiteSeerX.psu:10.1.1.615.5177",
-
URL = "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.615.5177",
-
URL = "http://seer.lcc.ufmg.br/index.php/jidm/article/download/85/33/",
-
size = "16 pages",
-
abstract = "Classification algorithms usually assume that any
example in the raining set should contribute equally to
the classification model being generated. However, this
is not always the case. This paper shows that the
contribution of an example to the classification model
varies according to many factors, which are application
dependent, and can be estimated using what we call a
credibility function. The credibility of an entity
reflects how much value it aggregates to a task being
performed, and here we investigate it in Automatic
Document Classification, where the credibility of a
document relates to its terms, authors, citations,
venues, time of publication, among others. After
introducing the concept of credibility in
classification, we investigate how to estimate a
credibility function using information regarding
documents content, citations and authorship using
mainly metrics previously defined in the literature. As
the credibility of the content of a document can be
easily mapped to any other classification problem, in a
second phase we focus on content-based credibility
functions. We propose a genetic programming algorithm
to estimate this function based on a large set of
metrics generally used to measure the strength of
term-class relationship. The proposed and evolved
credibility functions are then incorporated to the
Naive Bayes classifier, and applied to four text
collections, namely ACM-DL, Reuters, Ohsumed, and 20
Newsgroup. The results obtained showed significant
improvements in both micro-F1 and macro-F1, with gains
up to 21percent in Ohsumed when compared to the
traditional Naive Bayes.",
-
notes = "SBBD 2010 https://seer.ufmg.br/index.php/jidm/index",
- }
Genetic Programming entries for
Joao Palotti
Thiago Cunha de Moura Salles
Gisele L Pappa
Filipe de Lima Arcanjo
Marcos Andre Goncalves
Wagner Meira
Citations