Elsevier

Food Control

Volume 144, February 2023, 109389
Food Control

Predictions of multiple food quality parameters using near-infrared spectroscopy with a novel multi-task genetic programming approach

https://doi.org/10.1016/j.foodcont.2022.109389Get rights and content

Abstract

In order to meet the increasing demand for food safety and quality, new methods for simultaneous and rapid determination of multiple food quality parameters (FQPs) are urgently needed in the food industry. Incorporating near-infrared (NIR) spectroscopy and spectral prediction model for rapid, repeatable, non-destructive, and low running costs quantitative analysis of FQPs is enjoying increasing popularity in the food industry. However, most existing spectrum-based prediction models are trained under a single-task learning framework, that is, a prediction model for each quality parameter and spectrum is constructed separately. This paradigm ignores possible connections among prediction tasks of different FPQs, which may result in the performance degradation of a single FPQ prediction model. This study proposes a novel multi-task genetic programming-based approach named EM4GPO for building multiple FQPs predictions simultaneously. In EM4GPO, the multi-dimensional trees are used to encode the raw NIR spectrum to shared features of multiple FQPs; for each FQP, a least square support vector regression (LS-SVR) modeling is performed on the shared features to obtain private features and prediction model; during the optimization process, a new algorithm is developed to optimize the previously obtained shared and private features, and LS-SVR prediction models through population evolution by combining the multidimensional multiclass genetic programming with multidimensional populations optimization method with nondominated sorting method. The proposed EM4GPO model is evaluated and compared with nine popular NIR prediction models using 10 NIR spectral datasets. The experimental results showed that EM4GPO outperformed other commonly used methods in all datasets which indicates that EM4GPO is competitive and effective in solving the problem of multiple FQPs predictions using the NIR spectrum.

Introduction

As globalization proceeds, the food industry has evolved rapidly toward higher-volume, high-quality, safe food production (Nakat & Bou-Mitri, 2021). For this purpose, requirements for the fast, accurate, and non-destructive assessment of food quality have significantly increased over the past decade. Among them, near-infrared (NIR) spectroscopy has attracted considerable attention in existing studies (Cortés, Blasco, Aleixos, Cubero, & Talens, 2019; Lu et al., 2022). This technology can implement quantitative or qualitative food product detection via indirect measurements (Cai et al., 2021; Li, Jang, Li, & Liu, 2021). The general procedure of the methods is to (1) capture the spectral information of food products at different wavelengths by NIR spectroscopy, and (2) develop prediction models for determining the food quality parameters (FQP), such as chemical content and mechanical properties of food products from the corresponding spectral responses (Hernández-Hernández, Fernández-Cabanás, Rodríguez-Gutiérrez, Fernández-Prior, & Morales-Sillero, 2022; Shen et al., 2022; Tugnolo et al., 2021; Turgut, Entrenas, Taşkın, Garrido-Varo, & Pérez-Marín, 2022). However, NIR spectra are usually highly complex, such as many overlapping peaks, resulting in broad wavelengths, and wavelengths lying in the region of overtones and combination modes (Manley, 2014). Therefore, the development of FQP prediction models is one of the most critical aspects of food quality measurements using NIR spectroscopy but currently remains challenging.

Spectral prediction models are usually built based on a regression framework. Linear regression techniques, such as multiple linear regression (MLR) (Liew & Lau, 2012), partial least squares regression (PLSR) (Rossi & Lozano, 2020), etc., have achieved great success in the quantitative evaluation of food quality. In addition, several nonlinear regression approaches, such as least squares support vector regression (LS-SVR) (Li, Huang, Zhao, & Zhang, 2013), regression genetic programming (M3GPSpectra) (Yang, Wang, Zhao, Huang, & Zhu, 2021), and deep convolutional neural networks (DeepSpectra) (Zhang, Lin, Xu, Luo, & Ying, 2019), are applied for developing prediction models, to handle the complex nonlinear relationship between NIR spectral variables and FQPs. The above-mentioned methods namely single-task learning (STL) methods concentrated on single FQP estimation. Although the methods are easy to use and achieved outstanding results, they face two challenges. On the one hand, extensive sample data and its corresponding labels (FQPs) are necessary for learning methods, especially deep learning. However, the labels are usually obtained through destructive and time-consuming measuring procedures, resulting in the lack of labeled samples and unreliable prediction results (Goisser, Hey, & Kesselheim, 2020; Zhang, Ding, Wang, Guo, & Li, 2020). On the other hand, the food industry is often faced with the task of simultaneously predicting multiple FPQs. Although the task of predicting multiple FQPs can be accomplished using a combination of STL methods, the STL modeling method treats each FQP prediction as an independent task, which will block the information interaction among different FQPs and is typically not conducive to developing high-performance FQP prediction model (Mishra & Passos, 2022).

To overcome the limitations of the STL methods, much effort has been made to explore multi-task learning (MTL) methods to jointly train models for different related tasks. During the concurrent learning process, each model shares the knowledge it has learned with other models as meta-information for training them, which alleviates the negative effect of the cold-start problem caused by a few training samples on the STL methods (Feng, Liu, Wu, & Zuo, 2022). Meanwhile, these tasks restrict each other to avoid over-learning through information sharing, so that all prediction models can attain better generalization (Ng et al., 2019). MTL methods such as multi-response partial least-squares regression (PLS2) (Pedro & Ferreira, 2007) and multi-task least-squares support vector regression (MTLSSVR) (Xu, An, Qiao, Zhu, & Li, 2013) have achieved outstanding performance in the domain of food quality evaluation based on NIR spectroscopy. However, the development of the classical MTL models depends on the effective features for FPQ prediction that are difficult to craft manually (Hekler et al., 2019). The emergence of artificial intelligence technology, such as the multi-task deep learning (MTDL) method and evolutionary multitask optimization (EMTO) algorithm, greatly promotes the development of multiple parameter predictions (Sosnin et al., 2019; Zhong, Feng, Cai, & Ong, 2018). Currently, MTDL methods typically employ 1-dimensional (1D) convolution modules to extract shared features from the raw spectrum, then map these features to the final output layer using a full connection layer with a linear activation function, and finally yield multiple quality parameters, each of which corresponds to an output node (Assadzadeh, Walker, McDonald, Maharjan, & Panozzo, 2020; Tsakiridis, Keramaris, Theocharis, & Zalidis, 2020). As described above, the 1D convolution module is used to extract shared features that contain single-scale local features but no long-distance dependencies from the raw NIR spectrum, so that, FQPs mapped by these share features may be unreliable (Zhao, Wu, & Liu, 2022). Additionally, due to deep learning models being data-hungry, MTDLs easily fall into overfitting during learning prediction models (Zhang et al., 2019). In EMTO, the common information that is usually captured via the evolving population is used to optimize the concurrent related tasks. The information is transferred across tasks and updated in the transfer process, so that EMTO can learn shared information (shared features) that generalizes to all tasks. This learning paradigm is suitable for the scenario of insufficient data sources (Zheng, Qin, Gong, & Zhou, 2019). However, EMTO attaches great importance to the commonality of all tasks but overlooks the individuality of each task, and may encounter negative transfer of knowledge (Xu, Qin, & Xia, 2021).

Motivated by the aforementioned popular MTL methods, in this study, an evolutionary multitask multidimensional multiclass genetic programming with multidimensional populations optimization (EM4GPO) algorithm is proposed for the simultaneous prediction of multiple FQPs. The main contributions of this work are to (1) use a type of multi-dimensional tree for extracting multi-scale and multi-type NIR spectrum-based shared features for multiple FQPs; (2) design an LS-SVR for each FQP for learning private features and a prediction model based on the extracted shared features; (3) develop an efficient multi-task genetic programming algorithm based on the M3GP algorithm and nondominated sorting method for solving the optimization problem of shared and private features and LS-SVR prediction models; (4) validate the effectiveness of EM4GPO in ten NIR spectral datasets.

Section snippets

Dataset and data partition

Apple dataset: Apple samples with three varieties, ‘Jonagold’ (JG), ‘Golden Delicious’ (GD), and ‘Delicious’ (RD) were harvested in 2009 and 2010 from an orchard at Michigan State University's Clarksville Horticultural Experiment Station in Clarksville, MI. After harvest, the apples were transferred to the U.S. Department of Agriculture Agricultural Research Service (Michigan State University, Michigan, USA) for storage and testing (Mendoza, Lu, & Cen, 2014). The spectral data of apple samples

EM4GPO prediction

As discussed in Section 3.4, the ‘nondominated sorting method’ has a dominant effect on the search for near-optimal results of the EM4GPO method. Hence, the M3GP-LSSVR algorithm, which uses ‘Lexicographic tournament of size 5’ to select offspring, is first compared with EM4GPO. Table 3 compares the prediction results of the apple and sugar beet datasets using the EM4GPO and M3GP-LSSVR models. All RPD values obtained by the EM4GPO models are higher than those of M3GP-LSSVR, and for the

Conclusions

In this paper, NIR spectroscopy combined with a novel multi-task genetic programming approach EM4GPO is developed to predict multiple FQPs simultaneously. This approach first introduces multi-dimensional trees for encoding the raw NIR spectrum, which extracts multi-scale and multi-type shared features of multiple FQPs. Then, the private features and prediction model for an FQP are obtained by conducting an LS-SVR on the shared features. During the modeling process, the shared and private

CRediT authorship contribution statement

Yu Yang: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft. Shangpeng Sun: Writing – review & editing, Supervision. Leiqing Pan: Data curation, Investigation, Supervision. Min Huang: Formal analysis, Supervision, Funding acquisition. Qibing Zhu: Writing – review & editing, Formal analysis, Supervision, Funding acquisition.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 62273166, 61772240, U2003114). The authors thank China Scholarship Council (CSC) for the financial support to the author (Yu Yang) to conduct his doctoral research in the Department of Bioresource Engineering at McGill University. Authors would like to thank Dr. Renfu Lu of U.S. Department of Agriculture Agricultural Research Service (USDA/ARS) at Michigan State University, East Lansing, Michigan, for

References (63)

  • L.E.C. La Rosa et al.

    Multi-task fully convolutional network for tree species mapping in dense forests using small training hyperspectral data

    ISPRS Journal of Photogrammetry and Remote Sensing

    (2021)
  • A. Lee et al.

    Non-destructive prediction of soluble solid contents in Fuji apples using visible near-infrared spectroscopy and various statistical methods

    Journal of Food Engineering

    (2022)
  • J. Li et al.

    A comparative study for the quantitative determination of soluble solids content, pH and firmness of pears by Vis/NIR spectroscopy

    Journal of Food Engineering

    (2013)
  • L. Li et al.

    Wavelength selection method for near-infrared spectroscopy based on standard-sample calibration transfer of mango and apple

    Computers and Electronics in Agriculture

    (2021)
  • C. Lin et al.

    Determination of grain protein content by near-infrared spectrometry and multivariate calibration in barley

    Food Chemistry

    (2014)
  • F. Mendoza et al.

    Grading of apples based on firmness and soluble solids content using Vis/SWNIR spectroscopy and spectral scattering techniques

    Journal of Food Engineering

    (2014)
  • P. Mishra et al.

    Multi-output 1-dimensional convolutional neural networks for simultaneous prediction of different traits of fruit based on near-infrared spectroscopy

    Postharvest Biology and Technology

    (2022)
  • P. Mishra et al.

    Improving moisture and soluble solids content prediction in pear fruit using near-infrared spectroscopy with variable selection and model updating approach

    Postharvest Biology and Technology

    (2021)
  • Z. Nakat et al.

    COVID-19 and the food industry: Readiness assessment

    Food Control

    (2021)
  • S. Nawar et al.

    Optimal sample selection for measurement of soil organic carbon using on-line vis-NIR spectroscopy

    Computers and Electronics in Agriculture

    (2018)
  • W. Ng et al.

    Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra

    Geoderma

    (2019)
  • J. Padarian et al.

    Using deep learning to predict soil properties from regional spectral data

    Geoderma Regional

    (2019)
  • L. Pan et al.

    Measurement of moisture, soluble solids, sucrose content and mechanical properties in sugar beet using portable visible and near-infrared spectroscopy

    Postharvest Biology and Technology

    (2015)
  • L. Pan et al.

    Determination of sucrose content in sugar beet by portable visible and near-infrared spectroscopy

    Food Chemistry

    (2015)
  • A.M. Pedro et al.

    Simultaneously calibrating solids, sugars and acidity of tomato products using PLS2 and NIR spectroscopy

    Analytica Chimica Acta

    (2007)
  • G.B. Rossi et al.

    Simultaneous determination of quality parameters in yerba mate (Ilex paraguariensis) samples by application of near-infrared (NIR) spectroscopy and partial least squares (PLS)

    Lebensmittel-Wissenschaft & Technologie

    (2020)
  • G. Shen et al.

    Rapid and nondestructive quantification of deoxynivalenol in individual wheat kernels using near-infrared hyperspectral imaging and chemometrics

    Food Control

    (2022)
  • J.A. Suykens et al.

    Optimal control by least squares support vector machines

    Neural Networks

    (2001)
  • N.L. Tsakiridis et al.

    Simultaneous prediction of soil properties from VNIR-SWIR spectra using a localized multi-channel 1-D convolutional neural network

    Geoderma

    (2020)
  • A. Tugnolo et al.

    A reliable tool based on near-infrared spectroscopy for the monitoring of moisture content in roasted and ground coffee: A comparative study with thermogravimetric analysis

    Food Control

    (2021)
  • S.S. Turgut et al.

    Estimation of the sensory properties of black tea samples using non-destructive near-infrared spectroscopy sensors

    Food Control

    (2022)
  • Cited by (1)

    View full text