Creating deep neural networks for text classification tasks using grammar genetic programming

https://doi.org/10.1016/j.asoc.2023.110009Get rights and content

Highlights

  • Application of a grammar-based evolutionary approach to the design of DNNs.

  • Mixing techniques and layers of neural networks to build mixed networks.

  • Use of convolution techniques in graphs by different authors.

  • Evaluation of the approach presenting statistical tests on the method.

  • Comparison of the model with state-of-the-art techniques.

Abstract

Text classification is one of the Natural Language Processing (NLP) tasks. Its objective is to label textual elements, such as phrases, queries, paragraphs, and documents. In NLP, several approaches have achieved promising results regarding this task. Deep Learning-based approaches have been widely used in this context, with deep neural networks (DNNs) adding the ability to generate a representation for the data and a learning model. The increasing scale and complexity of DNN architectures was expected, creating new challenges to design and configure the models. In this paper, we present a study on the application of a grammar-based evolutionary approach to the design of DNNs, using models based on Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Graph Neural Networks (GNNs). We propose different grammars, which were defined to capture the features of each type of network, also proposing some combinations, verifying their impact on the produced designs and performance of the generated models. We create a grammar that is able to generate different networks specialized on text classification, by modification of Grammatical Evolution (GE), and it is composed of three main components: the grammar, mapping, and search engine. Our results offer promising future research directions as they show that the projected architectures have a performance comparable to that of their counterparts but can still be further improved. We were able to improve the results of a manually structured neural network in 8,18% in the best case.

Section snippets

Code metadata

Permanent link to reproducible Capsule: https://doi.org/10.24433/CO.5469683.v1.

Text classification

Recurrent neural networks (RNNs) are a class of artificial neural networks that exhibit temporal dynamic behavior, and can capture word dependencies, which turn suitable in text classification. Long Short-Term Memory (LSTM) is a recurrent neural network architecture widely used for this purpose. Similarly, Convolutional Neural Networks (CNNs) recognize patterns in the text and highlight unique or distinguishing features, which makes this type of architecture quite popular for text

Grammar-based approach for designing deep neural networks

GE is an evolutionary approach that can evolve programs using an arbitrary programming language [7]. DSGE [34], [37] was presented as an improved version of traditional GE [7]. It proposes an indirect encoding for solutions, affecting how the grammar and mapping interact to build the programs, and is more efficient during the search process. Further details can be found in [37].

GE and its variants are mainly defined by three components. Firstly, the grammar, where we define the layers,

Design of networks for text classification

In this section, the DSGE approach for the design of neural networks for text classification is described. DSGE allows building architectures based on CNN, LSTM, and GNN, through the proposed grammar, which combines building blocks based on such networks. In general terms, the TextDSGE approach is presented in Fig. 7. As input, the framework receives a text representation in vector space. The next module splits the input set into n-folders. Each folder is characterized by a set of training,

Experiments

In this section, we present the proposed experiments to evaluate our approach. First, the datasets are described, followed by the metrics we use to evaluate the models. Moreover, the results are presented and discussed.

Conclusions

Text classification tasks are still a relevant search thread. The search for new approaches and models is a constant in the field of NLP. The application of DL-based models has been a key feature in these tasks. In this context, the construction of neural network architectures has been directed towards fine-tuning by specialists, increasing the complexity and the number of hyperparameters to a point where a significant effort is needed to go further. To that extent, designing better DNNs

CRediT authorship contribution statement

Dimmy Magalhães: Conceptualization, Methodology, Software, Writing – original draft, Software, Formal analysis. Ricardo H.R. Lima: Conceptualization, Methodology, Software, Writing – original draft, Software, Formal analysis. Aurora Pozo: Writing – review & editing, Validation, Supervision.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Ricardo H.R. Lima reports financial support was provided by Coordination of Higher Education Personnel Improvement. Aurora Pozo reports financial support was provided by National Council for Scientific and Technological Development. The author is a public servant licensed for a doctoral course by the Court of Justice of the State of Piauí.

Acknowledgments

This work was funded by CAPES, Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq - Brazil and Tribunal de Justiça do Estado do Piauí- TJPI .

The authors would like to thank the Academic Publishing Advisory Center (Centro de Assessoria de Publicação Acadêmica, CAPA – www.capa.ufpr.br) of the Federal University of Paraná (UFPR) for assistance with English language developmental editing.

References (58)

  • RyanC. et al.

    Grammatical evolution: Evolving programs for an arbitrary language

  • PoliR. et al.

    Genetic programming

  • ReimersN. et al.

    Sentence-BERT: Sentence embeddings using siamese BERT-networks

  • DevlinJ. et al.

    BERT: pre-training of deep bidirectional transformers for language understanding

    CoRR

    (2018)
  • BaliyanA. et al.

    Multilingual sentiment analysis using RNN-LSTM and neural machine translation

  • ShinJ. et al.

    Contextual-CNN: A novel architecture capturing unified meaning for sentence classification

  • KalchbrennerN. et al.

    A convolutional neural network for modelling sentences

    (2014)
  • LiuP. et al.

    Recurrent neural network for text classification with multi-task learning

  • YoungT. et al.

    Recent trends in deep learning based natural language processing [Review Article]

    IEEE Comp. Int. Mag.

    (2018)
  • SocherR. et al.

    Recursive deep models for semantic compositionality over a sentiment treebank

  • YaoL. et al.

    Graph convolutional networks for text classification

    CoRR

    (2018)
  • MarcheggianiD. et al.

    Encoding sentences with graph convolutional networks for semantic role labeling

  • BeckD. et al.

    Graph-to-sequence learning using gated graph neural networks

    (2018)
  • CetoliA. et al.

    Graph convolutional networks for named entity recognition

    CoRR

    (2017)
  • G.F. Miller, P.M. Todd, S.U. Hegde, Designing Neural Networks using Genetic Algorithms, in: Proceedings of the...
  • TalbiE.-G.

    Automated design of deep neural networks: A survey and unified taxonomy

    ACM Comput. Surv.

    (2021)
  • FogelD.B. et al.

    Evolving neural networks

    Biol. Cybernet.

    (1990)
  • DingS. et al.

    An optimizing BP neural network algorithm based on genetic algorithm

    Artif. Intell. Rev.

    (2011)
  • Petroski SuchF. et al.

    Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning

    (2017)
  • Cited by (8)

    • Grammar-guided linear genetic programming for dynamic job shop scheduling

      2023, GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
    View all citing articles on Scopus

    The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

    View full text