Artificial neural network development by means of a novel combination of grammatical evolution and genetic algorithm

https://doi.org/10.1016/j.engappai.2014.11.003Get rights and content

Abstract

The most important problems with exploiting artificial neural networks (ANNs) are to design the network topology, which usually requires an excessive amount of expert’s effort, and to train it. In this paper, a new evolutionary-based algorithm is developed to simultaneously evolve the topology and the connection weights of ANNs by means of a new combination of grammatical evolution (GE) and genetic algorithm (GA). GE is adopted to design the network topology while GA is incorporated for better weight adaptation. The proposed algorithm needs to invest a minimal expert’s effort for customization and is capable of generating any feedforward ANN with one hidden layer. Moreover, due to the fact that the generalization ability of an ANN may decrease because of overfitting problems, the algorithm utilizes a novel adaptive penalty approach to simplify ANNs generated through the evolution process. As a result, it produces much simpler ANNs that have better generalization ability and are easy to implement. The proposed method is tested on some real world classification datasets, and the results are statistically compared against existing methods in the literature. The results indicate that our algorithm outperforms the other methods and provides the best overall performance in terms of the classification accuracy and the number of hidden neurons. The results also present the contribution of the proposed penalty approach in the simplicity and generalization ability of the generated networks.

Introduction

Artificial neural networks (ANNs) have risen from 1960s as a novel way to mimic the human brain. The learning ability of ANNs makes them a powerful tool for various applications, such as classification (Zhang, 2000, Cantu-Paz and Kamath, 2005, Castellani and Rowlands, 2009, Rivero et al., 2010, Castellani, 2013), clustering (Du, 2010), vision (Weimer et al., 2013, Zaidan et al., 2014), control systems (Li et al., 2014, Czajkowski et al., 2014), prediction (Muttil and Chau, 2006, Chen et al., 2008, Chen and Wang, 2010, Taormina et al., 2012, Weizhong, 2012, Khashei and Bijari, 2012, Coban, 2013), and many others (Rabuñal and Dorado, 2005). Designing the network architecture and training it are the most important problems with exploiting ANNs. In supervised problems, the task of training is the adaptation of the network weights so that the ANN can map a set of predefined input patterns to the desired corresponding outputs. BackPropagation (BP) algorithm (Rumelhart et al., 1986) is the most known ANNs training algorithm. Selecting effective input features, determining the number of hidden neurons, and the connectivity pattern of neurons is the task of designing the ANN topology that affects the network’s leaning capacity and generalization, and usually needs to be performed by experts; they have to experiment on different topologies to find a suitable one.

Evolutionary neural networks (ENNs) refer to a class of research in which evolutionary algorithms (EAs) are used in the ANN designing and/or training. EAs are population-based, stochastic search algorithms stimulating the natural evolution. As EAs search globally and can handle infinitely large, non-differentiable and multimodal search space of architectures, this field has been interested by many researchers and ANN developers.

ENNs can be followed in three major groups. The first is the adaptation of the ANN weights by EAs, a form of the ANN training. The second group of works is the use of EAs for designing the architecture of ANNs, and the third is the Topology and Weight Evolving of ANNs (TWEANN) that includes the methods proposed to simultaneously evolve the weights and architecture. In the following, some of the important research lines in ENNs are reviewed; the more complete and state-of-the art reviews can be found in Yao (1999) and Azzini and Tettamanzi (2011).

In the evolutionary training of ANNs, the network architecture is firstly determined, usually by experts. As the training of ANNs can be formulated as a search problem, the EA is used for exploring and exploiting the search space for finding an optimized set of weights. Binary representation of the connection weights is one of the earlier works in this area (Whitley, 1989, Caudell and Dolan, 1989, Whitley et al., 1990, Cantu-Paz and Kamath, 2005). However, some of researchers have used suitable real encoding EAs (Montana and Davis, 1989, Deb et al., 2002). Since evolution startegy (ES) and evolutionary programming (EP) algorithms are well-suited for real vector optimization, they have been employed in this area (Fogel et al., 1995, Heidrich-Meisner and Igel, 2009). An ANN training algorithm is always a well-known benchmark for new approaches developed in EA’s research area; the generalized generation gap parent-centric recombinatin (G3PCX) algorithm introduced by Deb et al. (2002), for example, has been used in the empirical study performed by Cantu-Paz and Kamath (2005) to compared with a binary encoded genetic algorithm (GA). In order to improve the ANN training process, a local search may be embedded into an EA (Topchy and Lebedko, 1997).

The architecture of an ANN is of great importance because it affects the learning capacity and generalization capability of the ANN. Gradient-based search approaches such as constructive and destructive algorithms may be used in the automatic design of the architecture of ANNs (Frean, 1990, Sietsma and Dow, 1991). Nevertheless, the main drawback of such methods is that they are quite susceptible to fall in local optima (Angeline et al., 1994).

In the evolutionary design of the architecture, there are two approaches for representing solutions. In the first approach, called direct encoding, all of the network architecture details are encoded in a chromosome. Assuming the ANN as a directed graph and using an adjacent matrix for representing its genotype is a common direct encoding method. In this way, each entry is a binary number representing the presence or absence of a connection between two nodes (Miller et al., 1989, Kitano, 1990, Belew et al., 1991, Cantu-Paz and Kamath, 2005). This representation can also be employed to prune the connections of a network trained with full connectivity pattern (Reed, 1993). The other approach is indirect encoding, where only some characteristics of the architecture of an ANN are encoded in a chromosome. In the indirect encoding approach, some aspects of the destination network and/or network generation (mapping process) are predefined. For example, if we know that a fully connected architecture is suitable for our problem, it is sufficient to only encode the number of hidden layers, the number of neurons in each layer and the parameters of BP algorithm in a chromosome. The knowledge added to the algorithm reduces the search space and usually leads to a compact encoding.

There are several types of indirect encoding in the literature. Kitano (1990) has introduced a grammar based indirect representation encoding production rules in a chromosome instead of an adjacent matrix of the network. In the genotype to phenotype mapping, the production rules are decoded to generate the adjacent matrix; transforming the matrix to the corresponding network is then straightforward. Compact genotype is the main advantage of this method. Siddiqi and Lucas (1998) have shown that direct encoding can be at least as good as the Kitano’s method. Fractal representation inspired by regularity, symmetry and self-similarity of live organisms is another indirect encoding scheme that may be more plausible than other encoding schemes (Merrill and Port, 1991). Gruau, 1993, Gruau, 1994 has introduced a cellular encoding as an indirect representation that is motivated by cell division in biology.

Cantu-Paz and Kamath (2005) have empirically evaluated, in addition to ENN methods for ANN training, also various ENN methods for feature selection (Kohavi and John, 1997, Yang and Honavar, 1998) and ANN designing on classification problems. Yang and Chen (2012) have proposed an evolutionary constructive and pruning algorithm to design the network topology, where its weights are optimized through BP. Furthermore, Soltanian et al. (2013) have applied a grammatical evolution (GE) algorithm (Ryan et al., 1998), called GE-BP, to design the architecture of an ANN, where the BP algorithm is used for the network evaluations against training data.

Yao and Liu (1997) have developed a TWEANN method based on EP, called EPNet, for simultaneously evolving the architecture and weights of an ANN, in which no crossover operation is utilized. NeuroEvolution of Augmenting Topologies (NEAT), presented by Stanley and Miikkulainen (2002), is another successful system in this area. A hypercube based NEAT with an indirect encoding has also been introduced by Stanley et al. (2009). Compact genotype, input as well as output scalability, and utilizing the geometry of the problem domain are the most important advantages of their system. Motsinger et al. (2006) and Tsoulos et al. (2008) have used GE for construction and training ANNs with one hidden layer. Castellani and Rowlands (2009) have developed a TWEANN method for recognizing wood veneer defects, and reported that there are no differences in accuracy between architectures using one and two hidden layers. Rivero et al. (2010) have applied genetic programming (GP) to design and train a feedforward ANN with any arbitrary architecture, and Oong and Isa (2011) have presented a global-local balanced GA to simultaneously design and train an arbitrary connected feedforward ANN. More recently, Castellani (2013) has compared evolutionary ANN designing and whole ANN development algorithms with classical feature selection and designing methods.

However, there are also other types of ANNs and EAs combination in the literature. Evolving the neuron transfer functions (Stork et al., 1990), the learning rule and parameters of BP (Bengio et al., 1990, Baxter, 1992), and ANN ensembles (Yao and Islam, 2008, Huanhuan and Yao, 2010, Felice and Yao, 2011, Donate et al., 2013, Ghazikhani et al., 2013) are other research lines in the ANN evolution. ANN ensembles originate from the idea of divide-and-conquer algorithms. Experiments have shown that EAs are good choice to automatically divide the problem space, and thus, ENN ensembles have attracted many researchers attention in the literature (Yao and Islam, 2008).

The major drawback of the ANN training algorithms described in Section 1.1 is the expert’s effort needed for designing the network topology. The major disadvantage of the ANN designing algorithms described in Section 1.2 is that the search space is complex and noisy because the fitness assigned to a given architecture is dependent to the learning method. The methods described in Section 1.3 are free from these problems. However, most of them suffer from inability of using the problem domain knowledge, while some of them are extremely dependent to the expert.

As mentioned earlier, the TWEANN method proposed by Tsoulos et al. (2008) uses GE for designing the network topology as well as optimizing its weights. That is, their method encodes both the network topology and its weights using a context free grammar (CFG) in Backus–Naur form (BNF). The GE approach, firstly introduced by Ryan et al. (1998), has been applied to successfully solve a range of problems (see, e.g., Ryan et al., 2002, O’Neill and Ryan, 2003, Motsinger et al., 2006, Chen et al., 2008, Chen and Wang, 2010). As stated by Tsoulos et al. (2008), the use of GE to evolve ANNs has the benefit of allowing easy shaping of the resulting search, in addition to leading to a compact encoding. However, GE does not seem to be much suitable for real vector optimization, i.e., for optimizing the connection weights, and may cause some problems such as high destructiveness of variation operators, thus destroying information evolved during the search. Accordingly, the algorithm introduced by Soltanian et al. (2013) uses GE only for designing the network topology, while its weights are optimized through BP. Nevertheless, this type of combination leads to a complex and noisy landscape, the major disadvantage of ANN designing algorithms. In order to overcome the drawbacks of the methods of Tsoulos et al. (2008) and Soltanian et al. (2013), this paper proposes a new TWEANN method, called GEGA, in which GE is adopted to design the network topology while GA is incorporated for weight adaptation. Indeed, GEGA employs a hybrid of direct and indirect encodings: a grammatical encoding to indirectly represent the topology and a real encoding to directly represent the connection weights with the aim of achieving a better global–local search in the space of weights. To the best of our knowledge, this is the first paper of applying such a combination to the whole development of ANNs. It is noteworthy that, although there are some other algorithms for real vector optimization (such as ES and EP algorithms), GA is well-matched for weight adaptation due to the fact that GE is mainly a GA but with a grammatical encoding, and consequently, the similarities between GA and GE allow us to combine them in one algorithm. Like the methods of Tsoulos et al. (2008) and Soltanian et al. (2013), GEGA is capable of generating any feedforward ANN with one hidden layer. GEGA can utilize expert knowledge about the problem on hand in order to more efficiently search in the infinite space of topologies, although it needs to invest a minimal expert׳s effort for customization. Moreover, another contribution of this paper is to propose a novel adaptive penalty approach that encourages smaller topologies. This allows GEGA to generate much simpler ANNs that have better generalization ability and also are easy to implement. To evaluate the performance of GEGA, extensive experiments are performed on real world classification datasets, and the results are statistically compared against other well-known and state-of-the-art algorithms in the literature.

The rest of the paper is organized as follows. The next section describes the proposed algorithm. The contributions of some GEGA components and computational results are analyzed in Section 3, followed by Section 4 discussing the characteristics of GEGA. Finally, Section 5 concludes the paper.

Section snippets

Proposed algorithm

The topology and the connection weights of an ANN are simultaneously determined by GEGA. The network topology includes the neurons in the input layer, i.e., the selected features, the neurons in the hidden layer, and the connectivity among the neurons. Since it is expected that simpler ANNs lead to more generalization ability, a penalty approach is proposed to simplify ANNs through the evolution process. The general structure of GEGA whose output is a feedforward ANN is shown in Fig. 1. The

Performance evaluation

To validate GEGA and make comparisons with other ENN algorithms, seven well-known classification problems used by most methods are chosen from the University of California at Irvine (UCI) repository of machine learning databases (Blake and Merz., 1998). Prior to conducting experiments, however, the numeric features in the data are linearly normalized to the interval [−1, +1], the nominal features (except for the class labels) are encoded with the 1-in-C coding, where for a C-class problem one

Discussion

As shown in the previous section, in terms of the classification accuracy and the number of hidden neurons, GEGA evolving both the topology and the connection weights of ANNs provides the best overall performance among the ENN methods in the study. In this section, however, some other characteristics of GEGA are described in details.

Conclusions and future work

To simultaneously evolve the topology and the connection weights of ANNs, this paper proposes a new combination of GE and GA that needs to invest a minimal expert’s effort for customization. To efficiently search the infinite space of topologies as well as connection weights, the proposed system uses grammatical encoding for the topology representation and real encoding for the weights representation in a chromosome. Using GE to design the network topology can also help the user introduce other

References (68)

  • V. Heidrich-Meisner et al.

    Neuroevolution strategies for episodic reinforcement learning

    J. Algorithms

    (2009)
  • M. Khashei et al.

    Hybridization of the probabilistic neural networks with feed-forward neural networks for forecasting

    Eng. Appl. Artif. Intell.

    (2012)
  • R. Kohavi et al.

    Wrappers for feature subset selection

    Artif. Intell.

    (1997)
  • X.L. Li et al.

    Nonlinear adaptive control using multiple models and dynamic neural networks

    Neurocomputing

    (2014)
  • J.W.L. Merrill et al.

    Fractally configured neural networks

    Neural Netw.

    (1991)
  • D. Rivero et al.

    Generation and simplification of artificial neural networks by means of genetic programming

    Neurocomputing

    (2010)
  • J. Sietsma et al.

    Creating artificial neural networks that generalize

    Neural Netw.

    (1991)
  • R. Taormina et al.

    Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon

    Eng. Appl. Artif. Intell.

    (2012)
  • A.P. Topchy et al.

    Neural network training by means of cooperative evolutionary search

    Nucl. Instrum. Methods Phys. Res., Sect. A

    (1997)
  • I. Tsoulos et al.

    Neural network construction and training using grammatical evolution

    Neurocomputing

    (2008)
  • D. Weimer et al.

    Learning defect classifiers for textured surfaces using neural networks and statistical feature representations

    Proc. CIRP

    (2013)
  • D. Whitley et al.

    Genetic algorithms and neural networks: optimizing connections and connectivity

    Parallel Comput.

    (1990)
  • S.H. Yang et al.

    An evolutionary constructive and pruning algorithm for artificial neural networks and its prediction applications

    Neurocomputing

    (2012)
  • A.A. Zaidan et al.

    Image skin segmentation based on multi-agent learning Bayesian and neural network

    Eng. Appl. Artif. Intell.

    (2014)
  • P.J. Angeline et al.

    An evolutionary algorithm that constructs recurrent neural networks

    IEEE Trans. Neural Netw.

    (1994)
  • A. Azzini et al.

    Evolutionary ANNs: a state of the art survey

    Intelligenza Artificiale

    (2011)
  • J. Baxter

    The evolution of learning algorithms for artificial neural networks

  • Belew, R., McInerney, J., Schraudolph, N., 1991. Evolving networks: using the genetic algorithm with connectionist...
  • Y. Bengio et al.

    Learning a Synaptic Learning Rule

    (1990)
  • Blake, C.L., Merz., C.J., 1998. UCI Repository of Machine Learning...
  • E. Cantu-Paz et al.

    An empirical comparison of combinations of evolutionary algorithms and neural networks for classification problems

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (2005)
  • Caudell, T.P., Dolan, C.P., 1989. Parametric connectivity: training of constrained networks using genetic algorithms....
  • L. Chen et al.

    Modeling strength of high-performance concrete using an improved grammatical evolution combined with macrogenetic algorithm

    J. Comput. Civil Eng.

    (2010)
  • K. Deb et al.

    A computationally efficient evolutionary algorithm for real-parameter optimization

    Evol. Comput.

    (2002)
  • Cited by (0)

    View full text