Transductive transfer learning based Genetic Programming for balanced and unbalanced document classification using different types of features
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{FU:2021:ASC,
-
author = "Wenlong Fu and Bing Xue and Xiaoying Gao and
Mengjie Zhang",
-
title = "Transductive transfer learning based Genetic
Programming for balanced and unbalanced document
classification using different types of features",
-
journal = "Applied Soft Computing",
-
volume = "103",
-
pages = "107172",
-
year = "2021",
-
ISSN = "1568-4946",
-
DOI = "doi:10.1016/j.asoc.2021.107172",
-
URL = "https://www.sciencedirect.com/science/article/pii/S1568494621000958",
-
keywords = "genetic algorithms, genetic programming, Document
classification, Transfer learning",
-
abstract = "Document classification is one of the predominant
tasks in Natural Language Processing. However, some
document classification tasks do not have ground truth
while other similar datasets may have ground truth.
Transfer learning can use similar datasets with ground
truth to train effective classifiers on the dataset
without ground truth. This paper introduces a
transductive transfer learning method for document
classification using two different text feature
representations-the term frequency (TF) and the
semantic feature doc2vec. It has three main
contributions. First, it enables the sharing knowledge
in a dataset using TF and a dataset using doc2vec in
transductive transfer learning for performance
improvement. Second, it demonstrates that the partially
learned programs from TFs and from doc2vecs can be
alternatively used to {"}label then learn{"} and they
improve each other. Lastly, it addresses the unbalanced
dataset problem by considering the unbalanced
distributions on categories for evolving proper Genetic
Programming (GP) programs on the target domains. Our
experimental results on two popular document datasets
show that the proposed technique effectively transfers
knowledge from the GP programs evolved from the source
domains to the new GP programs on the target domains
using TF or doc2vec. There are obviously more than 10
percentages improvement achieved by the GP programs
evolved by the proposed method over the GP programs
directly evolved from the source domains. Also, the
proposed technique effectively uses GP programs evolved
from unbalanced datasets (on the source and target
domains) to evolve new GP programs on the target
domains, which balances predictions on different
categories",
- }
Genetic Programming entries for
Wenlong Fu
Bing Xue
Xiaoying (Sharon) Gao
Mengjie Zhang
Citations