Predictions of multiple food quality parameters using near-infrared spectroscopy with a novel multi-task genetic programming approach

doi:10.1016/j.foodcont.2022.109389

Food Control

Volume 144, February 2023, 109389

https://doi.org/10.1016/j.foodcont.2022.109389 Get rights and content

Abstract

In order to meet the increasing demand for food safety and quality, new methods for simultaneous and rapid determination of multiple food quality parameters (FQPs) are urgently needed in the food industry. Incorporating near-infrared (NIR) spectroscopy and spectral prediction model for rapid, repeatable, non-destructive, and low running costs quantitative analysis of FQPs is enjoying increasing popularity in the food industry. However, most existing spectrum-based prediction models are trained under a single-task learning framework, that is, a prediction model for each quality parameter and spectrum is constructed separately. This paradigm ignores possible connections among prediction tasks of different FPQs, which may result in the performance degradation of a single FPQ prediction model. This study proposes a novel multi-task genetic programming-based approach named EM4GPO for building multiple FQPs predictions simultaneously. In EM4GPO, the multi-dimensional trees are used to encode the raw NIR spectrum to shared features of multiple FQPs; for each FQP, a least square support vector regression (LS-SVR) modeling is performed on the shared features to obtain private features and prediction model; during the optimization process, a new algorithm is developed to optimize the previously obtained shared and private features, and LS-SVR prediction models through population evolution by combining the multidimensional multiclass genetic programming with multidimensional populations optimization method with nondominated sorting method. The proposed EM4GPO model is evaluated and compared with nine popular NIR prediction models using 10 NIR spectral datasets. The experimental results showed that EM4GPO outperformed other commonly used methods in all datasets which indicates that EM4GPO is competitive and effective in solving the problem of multiple FQPs predictions using the NIR spectrum.

Introduction

As globalization proceeds, the food industry has evolved rapidly toward higher-volume, high-quality, safe food production (Nakat & Bou-Mitri, 2021). For this purpose, requirements for the fast, accurate, and non-destructive assessment of food quality have significantly increased over the past decade. Among them, near-infrared (NIR) spectroscopy has attracted considerable attention in existing studies (Cortés, Blasco, Aleixos, Cubero, & Talens, 2019; Lu et al., 2022). This technology can implement quantitative or qualitative food product detection via indirect measurements (Cai et al., 2021; Li, Jang, Li, & Liu, 2021). The general procedure of the methods is to (1) capture the spectral information of food products at different wavelengths by NIR spectroscopy, and (2) develop prediction models for determining the food quality parameters (FQP), such as chemical content and mechanical properties of food products from the corresponding spectral responses (Hernández-Hernández, Fernández-Cabanás, Rodríguez-Gutiérrez, Fernández-Prior, & Morales-Sillero, 2022; Shen et al., 2022; Tugnolo et al., 2021; Turgut, Entrenas, Taşkın, Garrido-Varo, & Pérez-Marín, 2022). However, NIR spectra are usually highly complex, such as many overlapping peaks, resulting in broad wavelengths, and wavelengths lying in the region of overtones and combination modes (Manley, 2014). Therefore, the development of FQP prediction models is one of the most critical aspects of food quality measurements using NIR spectroscopy but currently remains challenging.

Spectral prediction models are usually built based on a regression framework. Linear regression techniques, such as multiple linear regression (MLR) (Liew & Lau, 2012), partial least squares regression (PLSR) (Rossi & Lozano, 2020), etc., have achieved great success in the quantitative evaluation of food quality. In addition, several nonlinear regression approaches, such as least squares support vector regression (LS-SVR) (Li, Huang, Zhao, & Zhang, 2013), regression genetic programming (M3GPSpectra) (Yang, Wang, Zhao, Huang, & Zhu, 2021), and deep convolutional neural networks (DeepSpectra) (Zhang, Lin, Xu, Luo, & Ying, 2019), are applied for developing prediction models, to handle the complex nonlinear relationship between NIR spectral variables and FQPs. The above-mentioned methods namely single-task learning (STL) methods concentrated on single FQP estimation. Although the methods are easy to use and achieved outstanding results, they face two challenges. On the one hand, extensive sample data and its corresponding labels (FQPs) are necessary for learning methods, especially deep learning. However, the labels are usually obtained through destructive and time-consuming measuring procedures, resulting in the lack of labeled samples and unreliable prediction results (Goisser, Hey, & Kesselheim, 2020; Zhang, Ding, Wang, Guo, & Li, 2020). On the other hand, the food industry is often faced with the task of simultaneously predicting multiple FPQs. Although the task of predicting multiple FQPs can be accomplished using a combination of STL methods, the STL modeling method treats each FQP prediction as an independent task, which will block the information interaction among different FQPs and is typically not conducive to developing high-performance FQP prediction model (Mishra & Passos, 2022).

To overcome the limitations of the STL methods, much effort has been made to explore multi-task learning (MTL) methods to jointly train models for different related tasks. During the concurrent learning process, each model shares the knowledge it has learned with other models as meta-information for training them, which alleviates the negative effect of the cold-start problem caused by a few training samples on the STL methods (Feng, Liu, Wu, & Zuo, 2022). Meanwhile, these tasks restrict each other to avoid over-learning through information sharing, so that all prediction models can attain better generalization (Ng et al., 2019). MTL methods such as multi-response partial least-squares regression (PLS2) (Pedro & Ferreira, 2007) and multi-task least-squares support vector regression (MTLSSVR) (Xu, An, Qiao, Zhu, & Li, 2013) have achieved outstanding performance in the domain of food quality evaluation based on NIR spectroscopy. However, the development of the classical MTL models depends on the effective features for FPQ prediction that are difficult to craft manually (Hekler et al., 2019). The emergence of artificial intelligence technology, such as the multi-task deep learning (MTDL) method and evolutionary multitask optimization (EMTO) algorithm, greatly promotes the development of multiple parameter predictions (Sosnin et al., 2019; Zhong, Feng, Cai, & Ong, 2018). Currently, MTDL methods typically employ 1-dimensional (1D) convolution modules to extract shared features from the raw spectrum, then map these features to the final output layer using a full connection layer with a linear activation function, and finally yield multiple quality parameters, each of which corresponds to an output node (Assadzadeh, Walker, McDonald, Maharjan, & Panozzo, 2020; Tsakiridis, Keramaris, Theocharis, & Zalidis, 2020). As described above, the 1D convolution module is used to extract shared features that contain single-scale local features but no long-distance dependencies from the raw NIR spectrum, so that, FQPs mapped by these share features may be unreliable (Zhao, Wu, & Liu, 2022). Additionally, due to deep learning models being data-hungry, MTDLs easily fall into overfitting during learning prediction models (Zhang et al., 2019). In EMTO, the common information that is usually captured via the evolving population is used to optimize the concurrent related tasks. The information is transferred across tasks and updated in the transfer process, so that EMTO can learn shared information (shared features) that generalizes to all tasks. This learning paradigm is suitable for the scenario of insufficient data sources (Zheng, Qin, Gong, & Zhou, 2019). However, EMTO attaches great importance to the commonality of all tasks but overlooks the individuality of each task, and may encounter negative transfer of knowledge (Xu, Qin, & Xia, 2021).

Motivated by the aforementioned popular MTL methods, in this study, an evolutionary multitask multidimensional multiclass genetic programming with multidimensional populations optimization (EM4GPO) algorithm is proposed for the simultaneous prediction of multiple FQPs. The main contributions of this work are to (1) use a type of multi-dimensional tree for extracting multi-scale and multi-type NIR spectrum-based shared features for multiple FQPs; (2) design an LS-SVR for each FQP for learning private features and a prediction model based on the extracted shared features; (3) develop an efficient multi-task genetic programming algorithm based on the M3GP algorithm and nondominated sorting method for solving the optimization problem of shared and private features and LS-SVR prediction models; (4) validate the effectiveness of EM4GPO in ten NIR spectral datasets.

Section snippets

Dataset and data partition

Apple dataset: Apple samples with three varieties, ‘Jonagold’ (JG), ‘Golden Delicious’ (GD), and ‘Delicious’ (RD) were harvested in 2009 and 2010 from an orchard at Michigan State University's Clarksville Horticultural Experiment Station in Clarksville, MI. After harvest, the apples were transferred to the U.S. Department of Agriculture Agricultural Research Service (Michigan State University, Michigan, USA) for storage and testing (Mendoza, Lu, & Cen, 2014). The spectral data of apple samples

EM4GPO prediction

As discussed in Section 3.4, the ‘nondominated sorting method’ has a dominant effect on the search for near-optimal results of the EM4GPO method. Hence, the M3GP-LSSVR algorithm, which uses ‘Lexicographic tournament of size 5’ to select offspring, is first compared with EM4GPO. Table 3 compares the prediction results of the apple and sugar beet datasets using the EM4GPO and M3GP-LSSVR models. All RPD values obtained by the EM4GPO models are higher than those of M3GP-LSSVR, and for the

Conclusions

In this paper, NIR spectroscopy combined with a novel multi-task genetic programming approach EM4GPO is developed to predict multiple FQPs simultaneously. This approach first introduces multi-dimensional trees for encoding the raw NIR spectrum, which extracts multi-scale and multi-type shared features of multiple FQPs. Then, the private features and prediction model for an FQP are obtained by conducting an LS-SVR on the shared features. During the modeling process, the shared and private

CRediT authorship contribution statement

Yu Yang: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft. Shangpeng Sun: Writing – review & editing, Supervision. Leiqing Pan: Data curation, Investigation, Supervision. Min Huang: Formal analysis, Supervision, Funding acquisition. Qibing Zhu: Writing – review & editing, Formal analysis, Supervision, Funding acquisition.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 62273166, 61772240, U2003114). The authors thank China Scholarship Council (CSC) for the financial support to the author (Yu Yang) to conduct his doctoral research in the Department of Bioresource Engineering at McGill University. Authors would like to thank Dr. Renfu Lu of U.S. Department of Agriculture Agricultural Research Service (USDA/ARS) at Michigan State University, East Lansing, Michigan, for

References (63)

A.A. Agyekum et al.
FT-NIR coupled chemometric methods rapid prediction of K-value in fish
Vibrational Spectroscopy
(2020)
R. Amsaraj et al.
Variable selection coupled to PLS2, ANN and SVM for simultaneous detection of multiple adulterants in milk using spectral data
International Dairy Journal
(2021)
T.C. Bora et al.
Multi-objective optimization of the environmental-economic dispatch with reinforcement learning based on non-dominated sorting genetic algorithm
Applied Thermal Engineering
(2019)
V. Cortés et al.
Monitoring strategies for quality control of agricultural products using visible and near-infrared spectroscopy: A review
Trends in Food Science & Technology
(2019)
X. Feng et al.
Social recommendation via deep neural network-based multi-task learning
Expert Systems with Applications
(2022)
S. Goisser et al.
Comparison of colorimeter and different portable food-scanners for non-destructive prediction of lycopene content in tomato fruit
Postharvest Biology and Technology
(2020)
B. Gyawali et al.
Evaluating the evidence behind the surrogate measures included in the FDA's table of surrogate endpoints as supporting approval of cancer drugs
EClinicalMedicine
(2020)
A. Hekler et al.
Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images
European Journal of Cancer
(2019)
C. Hernández-Hernández et al.
Rapid screening of unground cocoa beans based on their content of bioactive compounds by NIR spectroscopy
Food Control
(2022)
B. Huang et al.
Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications
Expert Systems with Applications
(2010)

L.E.C. La Rosa et al.

Multi-task fully convolutional network for tree species mapping in dense forests using small training hyperspectral data

ISPRS Journal of Photogrammetry and Remote Sensing

(2021)

A. Lee et al.

Non-destructive prediction of soluble solid contents in Fuji apples using visible near-infrared spectroscopy and various statistical methods

Journal of Food Engineering

(2022)

J. Li et al.

A comparative study for the quantitative determination of soluble solids content, pH and firmness of pears by Vis/NIR spectroscopy

Journal of Food Engineering

(2013)

L. Li et al.

Wavelength selection method for near-infrared spectroscopy based on standard-sample calibration transfer of mango and apple

Computers and Electronics in Agriculture

(2021)

C. Lin et al.

Determination of grain protein content by near-infrared spectrometry and multivariate calibration in barley

Food Chemistry

(2014)

F. Mendoza et al.

Grading of apples based on firmness and soluble solids content using Vis/SWNIR spectroscopy and spectral scattering techniques

Journal of Food Engineering

(2014)

P. Mishra et al.

Multi-output 1-dimensional convolutional neural networks for simultaneous prediction of different traits of fruit based on near-infrared spectroscopy

Postharvest Biology and Technology

(2022)

P. Mishra et al.

Improving moisture and soluble solids content prediction in pear fruit using near-infrared spectroscopy with variable selection and model updating approach

Postharvest Biology and Technology

(2021)

Z. Nakat et al.

COVID-19 and the food industry: Readiness assessment

Food Control

(2021)

S. Nawar et al.

Optimal sample selection for measurement of soil organic carbon using on-line vis-NIR spectroscopy

Computers and Electronics in Agriculture

(2018)

W. Ng et al.

Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra

Geoderma

(2019)

J. Padarian et al.

Using deep learning to predict soil properties from regional spectral data

Geoderma Regional

(2019)

L. Pan et al.

Measurement of moisture, soluble solids, sucrose content and mechanical properties in sugar beet using portable visible and near-infrared spectroscopy

Postharvest Biology and Technology

(2015)

L. Pan et al.

Determination of sucrose content in sugar beet by portable visible and near-infrared spectroscopy

Food Chemistry

(2015)

A.M. Pedro et al.

Simultaneously calibrating solids, sugars and acidity of tomato products using PLS2 and NIR spectroscopy

Analytica Chimica Acta

(2007)

G.B. Rossi et al.

Simultaneous determination of quality parameters in yerba mate (Ilex paraguariensis) samples by application of near-infrared (NIR) spectroscopy and partial least squares (PLS)

Lebensmittel-Wissenschaft & Technologie

(2020)

G. Shen et al.

Rapid and nondestructive quantification of deoxynivalenol in individual wheat kernels using near-infrared hyperspectral imaging and chemometrics

Food Control

(2022)

J.A. Suykens et al.

Optimal control by least squares support vector machines

Neural Networks

(2001)

N.L. Tsakiridis et al.

Simultaneous prediction of soil properties from VNIR-SWIR spectra using a localized multi-channel 1-D convolutional neural network

Geoderma

(2020)

A. Tugnolo et al.

A reliable tool based on near-infrared spectroscopy for the monitoring of moisture content in roasted and ground coffee: A comparative study with thermogravimetric analysis

Food Control

(2021)

S.S. Turgut et al.

Estimation of the sensory properties of black tea samples using non-destructive near-infrared spectroscopy sensors

Food Control

(2022)

Cited by (1)

Rapid and non-destructive quality estimation of cinchona, andrographis paniculata, and black pepper using a portable NIR spectroscopy measuring device
2024, Microchemical Journal
Quality assessment of natural products are mostly done by laboratory methods which are invasive, cost-intensive, with low-throughput and prone to human error. Near infrared (NIR) spectroscopy have emerged as a very useful technique as a rapid non-destructive detection and quality estimation tool in identifying chemical fingerprint sensitive to the vibrational modes in chemical bonds like O-H, C-H and N-H.
In this work, a reasonably low-cost and portable NIR spectrometer was developed with a few imported internal components, together with compatible software for its use, like the graphical user interface (GUI) and calibration program. Quinine in Cinchona (Cinchona L) bark, andrographolides in Andrographis paniculata and piperine in black pepper were estimated and the coefficient of determination (R²) of 0.93, 0.94, and 0.90 was obtained respectively for the regression analysis. The model for cinchona, andrographolides, and piperine gave residual prediction deviation (RPDs) of 4.25, 4.21 and 3.0, range error ratio (RERs) of 14.59, 14.42, 10.32 and the ratio between the standard error of calibration (SEC) and the standard error of prediction (SEP) i.e. SEP/SEC of 1.18, 0.21 and 0.92 respectively. The derived partial least squares (PLS) regression models fulfil the requirements of AACC Method 39–00 (AACC in AACC Method, 39–00:15, 1999) used for screening (RPD ≥ 2.5), as per knowledge of the authors. The device can be used for other molecular markers in plants and their byproducts.

View full text

Predictions of multiple food quality parameters using near-infrared spectroscopy with a novel multi-task genetic programming approach

Abstract

Introduction

Section snippets

Dataset and data partition

EM4GPO prediction

Conclusions

CRediT authorship contribution statement

Acknowledgments

Vibrational Spectroscopy

International Dairy Journal

Applied Thermal Engineering

Trends in Food Science & Technology

Expert Systems with Applications

Postharvest Biology and Technology

EClinicalMedicine

European Journal of Cancer

Food Control

Expert Systems with Applications

ISPRS Journal of Photogrammetry and Remote Sensing

Journal of Food Engineering

Journal of Food Engineering

Computers and Electronics in Agriculture

Food Chemistry

Journal of Food Engineering

Postharvest Biology and Technology

Postharvest Biology and Technology

Food Control

Computers and Electronics in Agriculture

Geoderma

Geoderma Regional

Postharvest Biology and Technology

Food Chemistry

Analytica Chimica Acta

Lebensmittel-Wissenschaft & Technologie

Food Control

Neural Networks

Geoderma

Food Control

Food Control