Using a small number of training instances in genetic programming for face image classification

doi:10.1016/j.ins.2022.01.055

Information Sciences

Volume 593, May 2022, Pages 488-504

https://doi.org/10.1016/j.ins.2022.01.055 Get rights and content

Abstract

Classifying faces is a difficult task due to image variations in illumination, occlusion, pose, expression, etc. Typically, it is challenging to build a generalised classifier when the training data is small, which can result in poor generalisation. This paper proposes a new approach for the classification of face images based on multi-objective genetic programming (MOGP). In MOGP, image descriptors that extract effective features are automatically evolved by optimising two different objectives at the same time: the accuracy and the distance measure. The distance measure is a new measure intended to enhance generalisation of learned features and/or classifiers. The performance of MOGP is evaluated on eight face datasets. The results show that MOGP significantly outperforms 17 competitive methods.

Introduction

Face image classification has many important applications in security, criminal detection and surveillance [1]. This task includes facial expression classification and face recognition. Facial expression classification aims to identify different facial expressions from face images, which is an important task for human motion analysis and communication [2]. Face recognition aims to classify face images from different people into groups [3]. Face images are often collected under different environments so that they have different poses, illuminations, occlusions, facial expressions, and other facial details, which makes the task difficult.

Typically, face features extracted from images contain discriminative information, which can make the images easier to be classified. Typical methods include scale-invariant feature transform (SIFT) [4], local binary patterns (LBP) [5], eigenfaces [6], and Fisherfaces [6]. Recent advances in feature learning have enabled automatic characterisation of images by learning effective features rather than manually identifying them [7], [8]. Commonly used methods are convolutional neural networks (CNNs) [9], dictionary learning [10], and genetic programming (GP) [8]. However, learning features from images is challenging because of large search space and high image variations.

Collecting and labelling large numbers of face images for training is often expensive or difficult due to privacy, security or other concerns. Because there are insufficient training data, it is difficult for features and classifiers to generalise effectively. Popular image classification methods, i.e., NN-based algorithms, often require sufficient large data to train due to a huge number of trainable parameters [11], [12]. These methods often combine with other strategies, such as data augmentation, transfer learning, meta-learning [13], to improve the generalisation. However, these strategies are not always effective and need strong assumptions. For example, most data augmentation-based methods assume that the newly generated data have the same distributions as the training data and transfer learning-based methods assume that the source domains/tasks are similar or related to the target domains/tasks. To this end, this paper aims to solve face image classification only using small training data. Instead of using NNs, which need sufficient data to train, we use GP to solve face image classification.

Evolutionary computation (EC) studies algorithms inspired by biological evolution and social intelligence to solve real-world problems [14], [15], [16]. As an EC technique, GP typically evolves variable-length computer programs to solve problems [17]. GP has good global search abilities without requiring a differentiating objective function. The solutions of GP are known with flexible complexity and high interpretability. There is significant potential for GP to learn general image features [8], [18], [19]. However, there is a lack of investigations on using GP for face image classification using small training data.

Existing GP methods also face the issue of poor generalisation using small training data. In most GP-based methods [20], the fitness function measures the accuracy using the training set. When the training set is small, it may be easy to obtain perfect training accuracy (fitness value), i.e., 100%, at the very beginning of evolution, but the learned model often has poor generalisation. To improve generalisation, this paper develops a new distance measure for GP fitness evaluation, in addition to the classification accuracy measure. Since the relationship between the accuracy and the distance is unknown, multi-objective optimisation algorithms that simultaneously optimise multiple objective functions can be used to handle this. EC techniques are the main approach for multi-objective optimisation and have shown a promise in many problems [21].

A multi-objective GP (MOGP) method is proposed in this paper to classify face images on small training sets. The MOGP approach learns facial features by maximising two objectives, namely classification accuracy and distance measure. The second objective is a new metric based on different distances, aiming to improve the generalisation ability of learned features and/or classifiers. MOGP is used to search multiple Pareto optimal solutions with the idea of non-dominated sorting. MOGP will be tested on eight face image datasets, including face classification and facial expression classification, with several images per class for training. MOGP will be compared with two GP methods and 15 non-GP methods to demonstrate its effectiveness. There are two main contributions:

•
A new distance measure is developed as an objective function to maximise the inter-class distance and minimise the intra-class distance. By performing such an optimisation, the measure enhances the generalisation ability of the learning system when the training data is small.
•
A MOGP algorithm is proposed to automatically learn facial features while maximising classification accuracy and a new distance measure.

The proposed approach can automatically generate a dynamic number of global and/or local features from small-scale images while maximising classification accuracy, maximising the inter-class distance and minimising the intra-class distance, thereby improving generalisation performance. The proposed approach is simple and does not need any assumptions. It can achieve high classification accuracy on different face image classification tasks. Furthermore, it can evolve human-interpretable solutions, showing the process of feature extraction.

Section snippets

Multi-objective optimisation

Multi-objective optimisation problems often maximise or minimise multiple (potentially) conflicting objectives at the same time and can be expressed as $minimiseF (x) = {f_{1} (x), f_{2} (x), \dots, f_{k} (x)}$ subject to: $g_{i} (x) ⩽ 0, i = 1, 2, \dots, m$ $h_{i} (x) = 0, i = 1, 2, \dots, n$ where $f_{1} (x), f_{2} (x), \dots, f_{k} (x)$ denote $k (k > 1)$ objectives and x represents decision variables. $g (x)$ and $h (x)$ denote two types of constraints. m and n are the numbers of constraints, respectively.

The Pareto front usually contains many non-dominated solutions that can be

The proposed approach

This section presents the new MOGP approach in detail, i.e., the individual representation, the objective functions and the overall algorithm.

Benchmark methods

To demonstrate the effectiveness, we compare MOGP with two GP methods and 15 non-GP methods, which are

•
a single-objective GP method that only optimises the classification accuracy defined as Eq. (5). This method is termed SGP1. The individual representation is the same as MOGP;
•
a single-objective GP method that only optimises the distance measure defined as Eq. (8). This method is termed SGP2. The individual representation is the same as MOGP;
•
four different classification algorithms using raw

Results and discussions

The classification accuracy (%) obtained by MOGP, two single-objective GP algorithms and the other 15 non-GP methods are listed in Table 4, Table 5. The statistical test is Wilcoxon rank-sum test (p = 0.05). In Table 4, Table 5, the symbols “+”, “–” and “=” indicate that MOGP performs significantly better, worse, or similar to the corresponding method. The last rows of these tables summarise the results of the significance tests.

Further analysis

This section further analyses MOGP in terms of approximated Pareto front, the number of learned features, computation time, parameter sensitivity, and evolved programs/trees.

Conclusions

This paper developed a MOGP algorithm that maximises the objectives of classification accuracy and a distance measure for face image classification using a small training set. The effectiveness of MOGP has been evaluated on eight face datasets. The results showed that MOGP outperformed two single-objective GP algorithms and 15 non-GP methods on these datasets. The results demonstrated that MOGP was effective for feature learning from small training data for face image classification.

The

CRediT authorship contribution statement

Ying Bi: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Bing Xue: Writing – review & editing, Supervision, Project administration, Funding acquisition. Mengjie Zhang: Writing – review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the Marsden Fund of New Zealand Government under Contracts VUW1509 and VUW1615, the Science for Technological Innovation Challenge (SfTI) fund under contract 2019-S7-CRS, the University Research Fund at Victoria University of Wellington Grant No. 216378/3764 and 223805/3986, MBIE Data Science SSIF Fund under the contract RTVU1914, and National Natural Science Foundation of China (NSFC) under Grant 61876169.

References (50)

F.Z. Canal et al.
A survey on facial emotion recognition techniques: A state-of-the-art literature review
Inf. Sci.
(2022)
F. Shen et al.
Face image classification by pooling raw features
Pattern Recogn.
(2016)
Z. Zhang et al.
An efficient interval many-objective evolutionary algorithm for cloud task scheduling problem under uncertainty
Inf. Sci.
(2022)
B. Niu et al.
Swarm intelligence algorithms for yard truck scheduling and storage allocation problems
Neurocomputing
(2016)
Y. Bi et al.
Multi-objective genetic programming for feature learning in face recognition
Appl. Soft Comput.
(2021)
C. Shao et al.
Dynamic dictionary optimization for sparse-representation-based face classification using local difference images
Inf. Sci.
(2017)
Y. Ren et al.
Facial semantic descriptors based on information granules
Inf. Sci.
(2019)
M. Xue et al.
A semantic facial expression intensity descriptor based on information granules
Inf. Sci.
(2020)
G.F. Plichoski et al.
A face recognition framework based on a pool of techniques and differential evolution
Inf. Sci.
(2021)
A.K. Jain et al.
Handbook of face recognition
(2011)

D.G. Lowe

Distinctive image features from scale-invariant keypoints

Proc. Int. J. Comput. Vis.

(2004)

T. Ahonen et al.

Face description with local binary patterns: Application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(2006)

P.N. Belhumeur et al.

Eigenfaces vs. Fisherfaces, Recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

(1997)

Y. Duan et al.

Context-aware local binary feature learning for face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

Y. Bi et al.

Genetic Programming for Image Classification: An Automated Approach to Feature Learning

(2021)

W. Rawat et al.

Deep convolutional neural networks for image classification: A comprehensive review

Neural Comput.

(2017)

G. Zhang, J. Yang, Y. Zheng, Z. Luo, J. Zhang, Optimal discriminative feature and dictionary learning for image set...

Y. Bi et al.

Dual-tree genetic programming for few-shot image classification

IEEE Trans. Evol. Comput.

(2021)

Y. Bi et al.

Learning and sharing: A multitask genetic programming approach to image feature learning

IEEE Trans. Evol. Comput.

(2021)

Y. Wang et al.

Generalizing from a few examples: A survey on few-shot learning

ACM Comput. Surv.

(2020)

B. Niu et al.

Structure-redesign-based bacterial foraging optimization for portfolio selection, in

J.R. Koza

Genetic Programming: On the Programming of Computers by Means of Natural Selection

(1992)

H. Al-Sahaf et al.

A survey on evolutionary machine learning

J. R. Soc. New Zealand

(2019)

H. Al-Sahaf et al.

Keypoints detection and feature extraction: A dynamic genetic programming approach for evolving rotation-invariant texture image descriptors

IEEE Trans. Evol. Comput.

(2017)

Y. Bi et al.

Genetic programming with a new representation to automatically learn features and evolve ensembles for image classification

IEEE Trans. Cybern.

(2021)

Cited by (16)

Surrogate-assisted PSO with archive-based neighborhood search for medium-dimensional expensive multi-objective problems
2024, Information Sciences
Thousands of real function evaluations are not burdensome when a surrogate-assisted evolutionary algorithm (SAEA) is used to solve expensive multi-objective optimization problems (MOPs). To reduce the computational overhead, this paper studies a surrogate-assisted multi-objective particle swarm optimization algorithm, named SaMOPSO_NS, in which an external archive-based neighborhood search, as a local search, and a pbest-dominance-based infill criterion are newly developed. The local search works hard to refine exploitation around the current non-dominated individuals once the trigger mechanism is activated, while the pbest-dominance-based infill criterion chooses non-dominated individuals predicted by an ensemble surrogate for actual evaluations. With the collaborative efforts between the local search strategy and the infill criterion, computing resources are more efficiently allocated. Three types of benchmark test instances with different dimensions as well as an engineering expensive MOP are used to examine the proposed algorithm. Experimental results demonstrate that the proposed algorithm significantly outperforms its rivals with fewer real evaluations on most medium-dimensional MOPs. Moreover, the optimized electromagnetic acoustic transducers achieved an amplitude and amplitude ratio of 9.666E-07 mm and 0.1328 respectively, markedly outpacing previously reported results.
Collaborative resource allocation-based differential evolution for solving numerical optimization problems
2024, Information Sciences
Differential evolution (DE) is an efficient and powerful population-based search algorithm for solving numerical optimization problems in continuous spaces. It has been proven that multi-strategy DE algorithms are more effective than single-strategy DE algorithms in addressing benchmark and real-world problems. However, most multi-strategy DE variants focus on maintaining population diversity and balancing exploitation and exploration, ignoring the dynamic allocation of computational resources. Moreover, the success of these algorithms often depends on additional designed techniques, leading to increased computational complexity. In this paper, the Collaborative Resource Allocation-based Differential Evolution (CRADE) is introduced. It involves a collaborative resource allocation mechanism that utilizes the historical performance ranking of three mutation strategies to automatically allocate computational resources to various subpopulations during the search process. The parameter adaptation technique is used to adjust the associated control parameters of different mutation strategies. As a result, the most efficient mutation strategy consumes the majority of computational resources at different search stages to mitigate inefficient search under constrained resources. The performance of CRADE is evaluated on the well-known CEC2013 benchmark function set. The paper also investigates its application in the parameter identification of photovoltaic solar cells and modules. The overall results show that CRADE exhibits superior and competitive performance compared to other state-of-the-art algorithms. Consequently, CRADE has emerged as a novel and effective approach for addressing numerical optimization problems, distinguished by its excellence, practicality, and unwavering reliability.
A genetic programming-based method for image classification with small training data
2024, Knowledge-Based Systems
Genetic programming (GP) has been considerably used for image classification because of its ability to learn simple and effective models. However, most GP methods require a large amount of training data to learn informative features for classification, where the generalization performance might be poor when only a few training instances are available. In addition to using classification accuracy to assess the goodness of GP individuals/solutions like in most GP methods, this paper proposes a new fitness function containing distance measures. The proposed method uses different distance measures to deal with binary and multi-class classification automatically. By simultaneously minimizing the within-class distance and maximizing the between-class distance, the generalization performance can be improved. Furthermore, existing GP methods typically employ standard crossover to search for the best individuals from the whole search space. However, these methods might not completely exploit the potential local search space. Based on the niching technique, this paper develops a new crossover operator, which enables better exploitation of the global and local search space, improving learning effectiveness and classification accuracy. The new approach achieves significantly better generalization performance than almost all benchmark methods on eight datasets and is also computationally efficient. Further analysis demonstrates the significance of the new fitness function and crossover operator and shows the potentially good interpretability of the learned models.
A comprehensive review of automatic programming methods
2023, Applied Soft Computing
Automatic programming (AP) is one of the most attractive branches of artificial intelligence because it provides effective solutions to problems with limited knowledge in many different application areas. AP methods can be used to determine the effects of a system’s inputs on its outputs. Although there is increasing interest in solving many problems using these methods for a variety of applications, there is a lack of reviews that address the methods. Therefore, the goal of this paper is to provide a comprehensive literature review of AP methods. At the same time, we mention the main characteristics of the methods by grouping them according to how they represent solutions. We also try to give an outlook on the future of the field by highlighting possible bottlenecks and perspectives for the benefit of the researchers involved.
Automatic design of machine learning via evolutionary computation: A survey
2023, Applied Soft Computing
Machine learning (ML), as the most promising paradigm to discover deep knowledge from data, has been widely applied to practical applications, such as recommender systems, virtual reality, and semantic segmentation. However, building a high-quality ML system for given tasks requires expert knowledge and high computation cost. This poses a significant challenge to the further development of ML in large-scale practical applications. The automatic design of ML has become an increasingly popular research trend. At the same time, evolutionary computation (EC), as an excellent heuristic search technique, has been widely employed in ML optimization, so-called evolutionary machine learning (EML). In this paper, we offer a comprehensive review of the literature (more than 500 references) for EML methods. We first introduce the concepts related to ML and EC. After that, we propose a taxonomy criterion based on the ML and EC perspectives. The important research problems of EML, e.g., ML algorithms, solution representations, search paradigms, acceleration strategies and applications, are reviewed systematically. Lastly, we analyze EML limitations and discuss potential trends that are promising to address in the future.
A supervised data augmentation strategy based on random combinations of key features
2023, Information Sciences
Data augmentation strategies have always been important in machine learning techniques and play a unique role in model performance optimization processes. Therefore, in recent years, these techniques have become popular in the artificial intelligence field. In this paper, a new data augmentation strategy is proposed based on the interpretation algorithm of deep convolutional neural networks, i.e., constructing new training samples by deeply exploiting key features extracted from interpretable networks to achieve sample augmentation.
Thus, a novel supervised data augmentation approach known as Supervised Data Augmentation–Key Feature Extraction (SDA-KFE) was proposed. By introducing the Neural Network Interpreter-Segmentation Recognition and Interpretation (NNI-SRI) algorithm, an augmentation strategy is proposed that can balance the high accuracy and high robustness of the final model while ensuring a large amount of data augmentation.
The advantages of the SDA-KFE algorithm are mainly reflected in the following aspects. First, it is easy to implement. This algorithm is implemented based on the lightweight NNI-SRI algorithm, which lays the foundation for the implementation of SDA-KFE so that it can be easily implemented on convolutional neural networks. Second, this model, which is widely applicable, can be applied to almost any deep convolutional network. Through research and experiments on this proposed algorithm, SDA-KFE can be applied in graphical image binary classification and multiclassification models. Third, SDA-KFE can rapidly construct data samples with diverse variations. Under the premise of determining the classification labels of the generated samples, the distribution of the feature unit composition of the samples can be controlled. Compared with traditional data augmentation methods, SDA-KFE can control the direction of the model performance, i.e., the balance between the pursuit of high accuracy and robust performance of the model. Therefore, the novel supervised augmentation approach proposed in this paper is relevant for optimizing deep convolutional neural networks, solving model overfitting, augmenting data types, etc. The data augmentation algorithm proposed in this paper can be regarded as a useful supplement to traditional data augmentation methods, such as horizontal or vertical image flipping, cropping, color transformation, extension and rotation.

View all citing articles on Scopus

View full text

Using a small number of training instances in genetic programming for face image classification

Abstract

Introduction

Section snippets

Multi-objective optimisation

The proposed approach

Benchmark methods

Results and discussions

Further analysis

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Inf. Sci.

Pattern Recogn.

Inf. Sci.

Neurocomputing

Appl. Soft Comput.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Handbook of face recognition

Distinctive image features from scale-invariant keypoints

Proc. Int. J. Comput. Vis.

Face description with local binary patterns: Application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Eigenfaces vs. Fisherfaces, Recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Context-aware local binary feature learning for face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Genetic Programming for Image Classification: An Automated Approach to Feature Learning

Deep convolutional neural networks for image classification: A comprehensive review

Neural Comput.

Dual-tree genetic programming for few-shot image classification

IEEE Trans. Evol. Comput.

Learning and sharing: A multitask genetic programming approach to image feature learning

IEEE Trans. Evol. Comput.

Generalizing from a few examples: A survey on few-shot learning

ACM Comput. Surv.

Structure-redesign-based bacterial foraging optimization for portfolio selection, in

Genetic Programming: On the Programming of Computers by Means of Natural Selection

A survey on evolutionary machine learning

J. R. Soc. New Zealand

Keypoints detection and feature extraction: A dynamic genetic programming approach for evolving rotation-invariant texture image descriptors

IEEE Trans. Evol. Comput.

Genetic programming with a new representation to automatically learn features and evolve ensembles for image classification

IEEE Trans. Cybern.