An Embedded Unlabeled Data Partitioning PU-Learning Method Based on Genetic Programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.8355
- @Article{Zhou:TCSS,
-
author = "Yu Zhou and Nanjian Yang and Ran Wang",
-
title = "An Embedded Unlabeled Data Partitioning {PU-Learning}
Method Based on Genetic Programming",
-
journal = "IEEE Transactions on Computational Social Systems",
-
note = "Early Access",
-
keywords = "genetic algorithms, genetic programming, Task
analysis, Mathematical models, Data models, Training,
Supervised learning, Sociology, Iterative methods, Data
partitioning, evolutionary computation,
positive-unlabeled learning (PUL), text
classification",
-
ISSN = "2329-924X",
-
DOI = "
doi:10.1109/TCSS.2024.3406377",
-
abstract = "In traditional binary classification tasks, learning
algorithms conventionally distinguish positive and
negative samples by leveraging fully labeled training
data. However, in practical applications, there
frequently arises a scenario where only a small number
of positive samples and a large volume of unlabeled
data exist, with the latter potentially containing a
mix of positive and negative instances. This situation
is known as positive-unlabeled learning (PUL) and has
drawn considerable interest across various domains,
such as text categorization. Despite numerous studies
addressing PUL issues, few have focused on improving
the two-step methodology in the data partitioning
process, and even fewer have explored the application
of genetic programming (GP) to PUL challenges. This
article introduces a GP-based embedded unlabeled data
partitioning method (EPGP) tailored for the PUL
problem, particularly in the context of few-shot
learning scenarios. The approach adopts a multiphase
data partitioning strategy, integrating the
partitioning process into the evolutionary cycle of the
GP population, progressively isolating positive samples
from the unlabeled data to yield a PU dataset closer to
the real-world distribution. To achieve smoother data
partitioning, a dynamically adjusted partitioning
threshold strategy is incorporated. Finally, an
ensemble method is devised, capitalizing on the high
confidence associated with the originally labeled
positive samples to generate the final classification
outcome via weighted voting. Experimental evaluations
of EPGP against other state-of-the-art PUL methods on
22 datasets show significant advantages in terms of
balanced accuracy and macro-F1 scores on 17 datasets.
Moreover, a case study on text data classification
tasks demonstrates that EPGP consistently delivers
substantial performance enhancements under varying
proportions of labeled positive samples.",
-
notes = "Also known as \cite{10559613}",
- }
Genetic Programming entries for
Yu Zhou
Nanjian Yang
Ran Wang
Citations