Intelligent Fusion of Evidence from Multiple Sources for Text Classification
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @PhdThesis{oai:VTETD:etd-07032006-152103,
-
title = "Intelligent Fusion of Evidence from Multiple Sources
for Text Classification",
-
author = "Baoping Zhang",
-
year = "2006",
-
month = sep # "~06",
-
school = "Virginia Polytechnic Institute and State University",
-
type = "Doctor of Philosophy in Computer Science and
Applications",
-
address = "USA",
-
bibsource = "OAI-PMH server at scholar.lib.vt.edu",
-
contributor = "Dan Spitzner and Chang-Tien Lu and Edward A. Fox and
Weiguo Fan and P{\'a}vel Calado",
-
language = "en",
-
oai = "oai:VTETD:etd-07032006-152103",
-
rights = "unrestricted; I hereby certify that, if appropriate, I
have obtained and attached hereto a written permission
statement from the owner(s) of each third party
copyrighted matter to be included in my thesis,
dissertation, or project report, allowing distribution
as specified below. I certify that the version I
submitted is the same as that approved by my advisory
committee. I hereby grant to Virginia Tech or its
agents the non-exclusive license to archive and make
accessible, under the conditions specified below, my
thesis, dissertation, or project report in whole or in
part in all forms of media, now or hereafter known. I
retain all other ownership rights to the copyright of
the thesis, dissertation or project report. I also
retain the right to use in future works (such as
articles or books) all or part of this thesis,
dissertation, or project report.",
-
keywords = "genetic algorithms, genetic programming",
-
URL = "http://scholar.lib.vt.edu/theses/available/etd-07032006-152103/unrestricted/BaopingDissertationFinal.pdf",
-
URL = "http://scholar.lib.vt.edu/theses/available/etd-07032006-152103/",
-
size = "146 pages",
-
abstract = "Automatic text classification using current approaches
is known to perform poorly when documents are noisy or
when limited amounts of textual content is available.
Yet, many users need access to such documents, which
are found in large numbers in digital libraries and in
the WWW. If documents are not classified, they are
difficult to find when browsing. Further, searching
precision suffers when categories cannot be checked,
since many documents may be retrieved that would fail
to meet category constraints. In this work, we study
how different types of evidence from multiple sources
can be intelligently fused to improve classification of
text documents into predefined categories. We present a
classification framework based on an inductive learning
method -- Genetic Programming (GP) -- to fuse evidence
from multiple sources. We show that good classification
is possible with documents which are noisy or which
have small amounts of text (e.g., short metadata
records) -- if multiple sources of evidence are fused
in an intelligent way. The framework is validated
through experiments performed on documents in two
testbeds. One is the ACM Digital Library (using a
subset available in connection with CITIDEL, part of
NSF's National Science Digital Library). The other is
Web data, in particular that portion associated with
the Cad{\^e} Web directory. Our studies have shown that
improvement can be achieved relative to other machine
learning approaches if genetic programming methods are
combined with classifiers such as kNN. Extensive
analysis was performed to study the results generated
through the GP-based fusion approach and to understand
key factors that promote good classification.",
-
notes = "URN etd-07032006-152103",
- }
Genetic Programming entries for
Baoping Zhang
Citations