Regular ArticleSoft-sensor development for biochemical systems using genetic programming
Introduction
In today's industrial world, various types of sensors are needed to provide speedy and reliable measurements of a wide variety of process and/or product related variables and parameters. Sensors are sophisticated devices used in detecting and producing a measurable response to a change in, for example, chemical, physical, electrical, biochemical or optical state of a system. These measurements assist process operators and engineers in: (i) knowing the current state of the process, (ii) controlling and monitoring of the process, (iii) detecting and diagnosing any abnormal process behavior timely, (iv) taking corrective actions in the event of an abnormal process behavior, and (v) optimizing the performance of the process with a view to minimize costs and/or improve its efficiency. In many instances, however, an appropriate hardware-based robust sensor for measuring a process variable is either unavailable or the alternative analytical procedure is time-consuming and tedious. In such situations, the alternative of developing a soft-sensor should be explored. A soft-sensor is a software module capable of estimating a process variable in real-time. This module comprises a mathematical model, which makes use of the available quantitative knowledge regarding other process variables and parameters to estimate the magnitude of the chosen variable. The available information pertaining to other variables could be in the form of sensor measurements and/or mathematical models.
The challenges involved in the soft-sensor development for biochemical processes are the same as encountered in their modeling, optimization and control; the notable ones are as given below.
- •
Bioprocesses are characterized by their complex dynamics, such as inverse response, dead time and strong nonlinearities. These stem primarily from their main driving force, namely, micro-organisms (cells), which are very sensitive to any variations in the reaction environment (e.g., temperature, substrate concentration, pH, among others) [1].
- •
An important class of bioprocesses, i.e., batch fermentation, commonly evolves through three stages, namely lag, exponential and stationary stages. The factors that influence the behavior of micro-organisms vary in each stage, owing to which the batch fermentation system exhibits different nonlinear characteristics in different stages. As a result, a global soft-sensor model for batch fermentation leads to complicated structure with limited prediction accuracy [2].
- •
Crucial biochemical variables and/or parameters are hard to measure online in bioprocesses such as batch fermentation.
- •
In bioprocesses involving induced cultures, there exists a variation in the morphology, energy metabolism and macroscopic composition of the cells; hence quantification of “biomass” or similar variables is not straightforward [3].
Since 1970–1980s the cost of computer-based instrumentation lowered significantly, and the concept of soft-sensor gained ground in the process estimation and inferential controls [4], bioprocess monitoring [5], [6], control of nonlinear bioprocesses [7], biological wastewater treatment [8], melt index prediction [9], etc. For developing a soft-sensor, two principal approaches are phenomenological and empirical modeling. The former approach is employed when the detailed knowledge about the physico-chemical phenomena (kinetics, mass transfer, thermodynamics, etc.) underlying the process is available. Very often, gaining this knowledge itself becomes a tedious and costly task owing to the complex nature of the process and the extensive experimentation involved in collecting the necessary data. These difficulties make the phenomenological modeling route to soft-sensor development impractical. In such a situation, empirical modeling can be resorted to for the development of a soft-sensor.
There exist three commonly utilized methodologies for developing empirical models, namely regression analysis, artificial neural networks (ANNs) and support vector regression (SVR). For a pre-specified data-fitting function, the linear/nonlinear regression estimates the magnitudes of the function parameters that fit the given input–output data. Since many chemical and biochemical processes exhibit nonlinear behavior choosing an appropriate data-fitting function from a large number of possible alternatives becomes a daunting task. Despite expending a huge effort in guessing and testing different nonlinear data-fitting functions, there is no guarantee that a well-fitting function can indeed be secured in a finite number of trials. The other two empirical modeling formalisms, viz. ANNs and SVR, overcome the difficulties associated with the regression analysis since they do not require specification of the exact form of the data fitting function. Accordingly, ANN and SVR formalisms have been exploited in the development of soft-sensors and related applications including control of a distillation process [10], fed-batch reactor operation [11], and hybrid modeling of fermentation process [12]. Although these are potent nonlinear function approximation methods with a wide applicability, the ANNs and to some extent the SVR generate “black box” models whose structure and parameters do not provide any insight into the phenomena underlying the process being modeled.
In the present study, an artificial intelligence (AI) based exclusively data-driven modeling paradigm known as genetic programming (GP) [13] has been proposed for developing soft-sensor models for biochemical processes. Given multiple input–single output (MISO) data, the novelty of the GP formalism lies in its ability to search and optimize the form as also parameters of an appropriate linear/nonlinear data-fitting function. Despite its novelty and attractive properties, the GP formalism has not been explored widely for data-driven modeling applications in chemical and biochemical sciences/engineering to the same extent as other exclusively data-driven modeling methods, namely ANNs and SVR. In one of the soft-sensor applications involving the GP formalism, Kordon et al. [14] developed a soft-sensor for the emission estimation in one of the Dow Chemical Company plants in Freeport, TX. In this study, the soft-sensor was developed by integrating three computational intelligence approaches, namely, GP, analytical neural networks, and support vector machines. A rigorous literature survey indicates that the present study is the first one, wherein the GP formalism has been utilized for the development of soft-sensors for biochemical processes. The efficacy of the GP-based soft-sensors for biochemical processes has been demonstrated by conducting two case studies involving microorganism assisted extracellular production of lipase and production of bacterial poly(3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer. In these case studies, multiple input–single output (MISO) example data sets have been utilized in searching and optimizing the functional form (structure) as also parameters of the MISO data-fitting functions (soft-sensors). While in the first case study, the soft-sensor predicts the time-dependent lipase activity (U/ml), in the second case study it predicts the amount of accumulated polyhydroxyalkanoates (% dcw). The prediction accuracy and generalization capability of the GP-based soft-sensors have been compared with those developed using the ANN and SVR formalisms.
This paper is structured as follows. Section 2 provides a detailed description of the GP formalism and its implementation. The commonly used feed-forward artificial neural network, namely multilayer perceptron (MLP) and the machine learning based SVR formalism have been described in sections three and four, respectively. The two case studies wherein the GP-based soft-sensor models have been developed for two biochemical systems are presented in Section 5. This section also provides results of the comparison of the GP, MLP and SVR based soft-sensor models pertaining to the two biochemical systems. Finally, Section 6 summarizes the principal findings of the study.
Section snippets
Genetic programming (GP)
In its original form, the GP formalism was proposed as a method for automatically generating computer programs that perform predefined tasks [13]. It is an extension of the Genetic algorithm (GA) formalism [15]. Given an objective function, the GA efficiently searches and optimizes the values of the decision variables that would maximize or minimize the function. Similar to the GA, the GP is founded on the Darwinian principles of natural selection and reproduction. Accordingly, the GP
Artificial neural networks
The basic MLP structure portrayed in Fig. 6 is composed of three layers, namely input, hidden and output layers consisting of N, M and L processing elements (also termed “nodes” or “neurons”), respectively (where L = 1). Given the data set D, containing Np measurements of the input (independent/causal/predictor)–output (dependent/response) variables, the MLP learns the nonlinear interrelationships existing between them by appropriately adjusting the inter-node connection weights. The objective of
Support vector regression (SVR)
In recent years, support vector regression (SVR) [35], [36] has gained a widespread acceptance in the construction of data-driven nonlinear models. It is an adaptation of the statistical/machine learning theory based classification paradigm namely, support vector machines [35]. The SVR possesses some desirable characteristics such as, good generalization ability of the regression function, robustness of the solution, sparseness of the regression and an automatic control of the solution
Case studies
This section provides details of the development of GP-based soft-sensor models for two biochemical processes, namely microorganism assisted extracellular production of lipase and production of a bacterial copolymer. While in case study-I, the soft-sensor predicts the time-dependent lipase activity (U/ml), in case study-II the soft-sensor predicts the amount of accumulated polyhydroxyalkanoates (% dcw). The prediction performance of the GP-based soft-sensors has been compared with those
Conclusion
In the present paper, a novel and exclusively data-driven AI-based modeling formalism, namely genetic programming has been introduced for the development of soft-sensors for the bioprocesses. The efficacy of the GP-based soft-sensors has been successfully demonstrated by conducting two case studies involving production of the lipase enzyme and bacterial poly (3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer, respectively. In these case studies, the soft-sensors predict the time-dependent
References (59)
- et al.
Soft-sensors for process estimation and inferential control
J. Process Control
(1991) - et al.
Enhancing bioprocess operability with generic software sensors
J. Biotechnol.
(1992) - et al.
Software FIACRE: bioprocess monitoring on the basis of flow injection analysis using simultaneously a urea optode and a glucose luminescence sensor
J. Biotechnol.
(1993) - et al.
Data-derived soft-sensors for biological wastewater treatment plants: an overview
Environ. Modell. Softw.
(2013) - et al.
A soft sensor based on adaptive fuzzy neural network and support vector regression for industrial melt index prediction
Chemom. Intell. Lab. Syst.
(2013) - et al.
Soft-sensor development for fed-batch bioreactors using support vector regression
Biochem. Eng. J.
(2006) A Bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses
Comput. Chem. Eng.
(2012)- et al.
Real-time vapour sensing using an OFET-based electronic nose and genetic programming
Sens. Actuators B
(2009) - et al.
Synthesis of heat-integrated complex distillation systems via Genetic Programming
Comput. Chem. Eng.
(2008) - et al.
An improved genetic programming technique for the classification of Raman spectra
Knowledge-Based Syst.
(2005)
Symbolic regression via genetic programming in the optimization of a controlled release pharmaceutical formulation
Chemom. Intell. Lab. Syst.
Simultaneous modelling of enzyme production and biomass growth in recombinant Escherichia coli using artificial neural networks
Biochem. Eng. J.
Combining first principles modelling and artificial neural networks: a general framework
Comput. Chem. Eng.
Application of neural network to lysine production
Biochem. Eng. J.
Optimization of the Bacillus thuringiensis var. kurstaki HD-1 δ-endotoxins production by using experimental mixture design and artificial neural networks
Biochem. Eng. J.
ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process
Comput. Chem. Eng.
Hybrid process modelling and optimization strategies integrating neural networks/support vector regression and genetic algorithms: study of benzene isopropylation of Hbeta catalyst
Chem. Eng. J.
Lipase catalysed hydrolysis of short-chain substrates in solution and in emulsion: a kinetic study
Biochim. Biophys. Acta
Cold active microbial lipases: some hot issues and recent developments
Biotechnol. Adv.
Optimization of extracellular lipase production by Geotrichum sp. using factorial design
Bioresour. Technol.
Biosensors for the evaluation of lipase activity
J. Mol. Catal. B: Enzym.
Biodegradable plastics
Curr. Opin. Biotechnol.
Effect of growth condition on endo- and exopolymer biosynthesis in Anabaena cylindrica 10C
Phytochemistry
Process optimization for poly(3-hydroxybutyrate-co-3-hydroxyvalerate) co-polymer production by Nostoc muscorum
Biochem. Eng. J.
State and parameter estimation based on a nonlinear filter applied to an industrial process control of ethanol production
Braz. J. Chem. Eng.
Staged soft-sensor modelling for batch fermentation process
Real-time estimation of biomass and specific growth rate in physiologically variable recombinant fed-batch processes
Bioprocess Biosyst. Eng.
Dynamical modelling, analysis, monitoring and control design for nonlinear bioprocesses
Adv. Biochem. Eng. Biotechnol.
The application of neural network soft sensor technology to an advanced control system of distillation operation
Cited by (38)
Solid-liquid phase transition temperature prediction of alloys based on machine learning key feature screening
2024, Applied Materials TodayJust-in-time based soft sensors for process industries: A status report and recommendations
2023, Journal of Process ControlAlzheimer's disease diagnosis using genetic programming based on higher order spectra features[Formula presented]
2022, Machine Learning with ApplicationsA novel elemental composition based prediction model for biochar aromaticity derived from machine learning
2021, Artificial Intelligence in AgricultureCitation Excerpt :More importantly, GP is capable of automatically arriving at an optimized mathematic model without making any assumptions regarding the structure and parameters of the developing model. The most attractive feature of GP algorithm is that, depending on the nature of dependencies (whether linear or nonlinear) in the developing data (experimental data), the technique by itself can choose a suitable model that optimally fits the developing data based on Darwinian theory of natural selection (Faris and Sheta, 2013; Pandey et al., 2015; Sharma and Tambe, 2014). During the modification of the Mazumdar model by considering structure information of heteroatoms (e.g., H, O, and N), there were introduced several ideal assumptions.
A general correlation for the frictional pressure drop during condensation in mini/micro and macro channels
2020, International Journal of Heat and Mass TransferCitation Excerpt :In chemistry and chemical engineering, GP has been used for several modeling applications. For example, Sharma and Tambe used GP to consider the higher heating value in biomass fuels [16], treatment of oily wastewaters [17], condensation heat transfer coefficient [18] and assessment of soil liquefaction [19]. For a better understanding of GP implementation, the symbolic regression is described.
Two degree of freedom PID based inferential control of continuous bioreactor for ethanol production
2017, ISA TransactionsCitation Excerpt :Results reveal that NIRS method produces best model during validation of new data. Sharma and Tambe [12] developed a soft sensor based on genetic programming (GP) extracellular production of lipase enzyme and bacterial production of 3-hydroxybutyrate-co-3 hydroxyvalerate. Performance of GP based soft sensor is then compared with MLP and SVR soft sensors.