Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Tu:2020:CEC,
-
author = "Chaofan Tu and Menglin Cui",
-
booktitle = "2020 IEEE Congress on Evolutionary Computation (CEC)",
-
title = "Learning Regular Expressions for Interpretable Medical
Text Classification Using a Pool-based Simulated
Annealing Approach",
-
year = "2020",
-
editor = "Yaochu Jin",
-
month = "19-24 " # jul,
-
keywords = "genetic algorithms, genetic programming, Medical
diagnostic imaging, Simulated annealing, Task analysis,
Neural networks, Machine learning, Medical services,
Knowledge engineering, simulated annealing, regular
expression, medical text classification",
-
isbn13 = "978-1-7281-6929-3",
-
DOI = "doi:10.1109/CEC48606.2020.9185650",
-
abstract = "In this paper, we propose a rule-based engine composed
of high-quality and interpretable regular expressions
for medical text classification. The regular
expressions are autogenerated by a constructive
heuristic method and optimized using a Pool-based
Simulated Annealing (PSA) approach. Although existing
Deep Neural Network (DNN) methods present high quality
performance in most Natural Language Processing (NLP)
applications, the solutions are regarded as
uninterpretable {"}black boxes{"} to humans. Therefore,
rule-based methods are often introduced when
interpretable solutions are needed, especially in the
medical field. However, the construction of regular
expressions can be extremely labour-intensive for large
data sets. This research aims to reduce the manual
efforts while maintaining high-quality solutions. The
Pool-based Simulated Annealing method is proposed to
automatically optimize the performance of
machine-generated regular expressions without human
interference. The proposed method is tested on
real-life data provided by one of China's largest
online medical platforms. Experimental results show
that the proposed PSA method further improves the
performance of initial machine-generated regular
expressions compared with other meta-heuristics such as
Genetic Programming. We also believe that the proposed
method can serve as a vital complementary tool for the
existing machine learning approaches in text
classification applications when high levels of
interpretability of the solutions are required.",
-
notes = "Also known as \cite{9185650}",
- }
Genetic Programming entries for
Chaofan Tu
Menglin Cui
Citations