Elsevier

Biochemical Engineering Journal

Volume 85, 15 April 2014, Pages 89-100
Biochemical Engineering Journal

Regular Article
Soft-sensor development for biochemical systems using genetic programming

https://doi.org/10.1016/j.bej.2014.02.007Get rights and content

Highlights

  • Soft-sensors aid in real-time estimation of difficult-to-measure process variables.

  • Genetic programming introduced for development of soft-sensors for bioprocesses.

  • Given process data, GP finds both form and parameters of a data-fitting function.

  • GP-based soft-sensors with a good prediction accuracy developed for bioprocesses.

  • GP-based strategy has numerous applications in data-driven process modeling.

Abstract

Soft-sensors are software based process monitoring systems/models. In real-time they estimate those process variables, which are difficult to measure online or whose measurement by analytical procedures is tedious and time-consuming. In this study, the genetic programming (GP), an artificial intelligence based data-driven modeling formalism, has been introduced for the development of soft-sensors for biochemical processes. The novelty of the GP is that given example input–output data, it searches and optimizes both the form (structure) and parameters of an appropriate linear/nonlinear data-fitting model. In this study, GP-based soft-sensors have been developed for two bioprocesses, namely extracellular production of lipase enzyme and bacterial production of poly(3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer. While in case study-I, the soft-sensor predicts the time-dependent lipase activity (U/ml), in case study-II it predicts the amount of accumulated polyhydroxyalkanoates (% dcw). The prediction and generalization performance of the GP-based soft-sensors was compared with the corresponding multi-layer perceptron (MLP) neural network and support vector regression (SVR) based soft-sensors. This comparison indicates that in the first case study the GP-based soft-sensor with the training and test set correlation coefficient (root-mean-squared-error) magnitudes of >0.96 (≈0.962 U/ml) has clearly outperformed the two other soft-sensors. In case study-II involving bacterial copolymer production, the GP and SVR based soft-sensors have performed equally well (correlation coefficient  0.98) while the MLP based soft-sensor's performance was relatively inferior (correlation coefficient  0.94).

Introduction

In today's industrial world, various types of sensors are needed to provide speedy and reliable measurements of a wide variety of process and/or product related variables and parameters. Sensors are sophisticated devices used in detecting and producing a measurable response to a change in, for example, chemical, physical, electrical, biochemical or optical state of a system. These measurements assist process operators and engineers in: (i) knowing the current state of the process, (ii) controlling and monitoring of the process, (iii) detecting and diagnosing any abnormal process behavior timely, (iv) taking corrective actions in the event of an abnormal process behavior, and (v) optimizing the performance of the process with a view to minimize costs and/or improve its efficiency. In many instances, however, an appropriate hardware-based robust sensor for measuring a process variable is either unavailable or the alternative analytical procedure is time-consuming and tedious. In such situations, the alternative of developing a soft-sensor should be explored. A soft-sensor is a software module capable of estimating a process variable in real-time. This module comprises a mathematical model, which makes use of the available quantitative knowledge regarding other process variables and parameters to estimate the magnitude of the chosen variable. The available information pertaining to other variables could be in the form of sensor measurements and/or mathematical models.

The challenges involved in the soft-sensor development for biochemical processes are the same as encountered in their modeling, optimization and control; the notable ones are as given below.

  • Bioprocesses are characterized by their complex dynamics, such as inverse response, dead time and strong nonlinearities. These stem primarily from their main driving force, namely, micro-organisms (cells), which are very sensitive to any variations in the reaction environment (e.g., temperature, substrate concentration, pH, among others) [1].

  • An important class of bioprocesses, i.e., batch fermentation, commonly evolves through three stages, namely lag, exponential and stationary stages. The factors that influence the behavior of micro-organisms vary in each stage, owing to which the batch fermentation system exhibits different nonlinear characteristics in different stages. As a result, a global soft-sensor model for batch fermentation leads to complicated structure with limited prediction accuracy [2].

  • Crucial biochemical variables and/or parameters are hard to measure online in bioprocesses such as batch fermentation.

  • In bioprocesses involving induced cultures, there exists a variation in the morphology, energy metabolism and macroscopic composition of the cells; hence quantification of “biomass” or similar variables is not straightforward [3].

Since 1970–1980s the cost of computer-based instrumentation lowered significantly, and the concept of soft-sensor gained ground in the process estimation and inferential controls [4], bioprocess monitoring [5], [6], control of nonlinear bioprocesses [7], biological wastewater treatment [8], melt index prediction [9], etc. For developing a soft-sensor, two principal approaches are phenomenological and empirical modeling. The former approach is employed when the detailed knowledge about the physico-chemical phenomena (kinetics, mass transfer, thermodynamics, etc.) underlying the process is available. Very often, gaining this knowledge itself becomes a tedious and costly task owing to the complex nature of the process and the extensive experimentation involved in collecting the necessary data. These difficulties make the phenomenological modeling route to soft-sensor development impractical. In such a situation, empirical modeling can be resorted to for the development of a soft-sensor.

There exist three commonly utilized methodologies for developing empirical models, namely regression analysis, artificial neural networks (ANNs) and support vector regression (SVR). For a pre-specified data-fitting function, the linear/nonlinear regression estimates the magnitudes of the function parameters that fit the given input–output data. Since many chemical and biochemical processes exhibit nonlinear behavior choosing an appropriate data-fitting function from a large number of possible alternatives becomes a daunting task. Despite expending a huge effort in guessing and testing different nonlinear data-fitting functions, there is no guarantee that a well-fitting function can indeed be secured in a finite number of trials. The other two empirical modeling formalisms, viz. ANNs and SVR, overcome the difficulties associated with the regression analysis since they do not require specification of the exact form of the data fitting function. Accordingly, ANN and SVR formalisms have been exploited in the development of soft-sensors and related applications including control of a distillation process [10], fed-batch reactor operation [11], and hybrid modeling of fermentation process [12]. Although these are potent nonlinear function approximation methods with a wide applicability, the ANNs and to some extent the SVR generate “black box” models whose structure and parameters do not provide any insight into the phenomena underlying the process being modeled.

In the present study, an artificial intelligence (AI) based exclusively data-driven modeling paradigm known as genetic programming (GP) [13] has been proposed for developing soft-sensor models for biochemical processes. Given multiple input–single output (MISO) data, the novelty of the GP formalism lies in its ability to search and optimize the form as also parameters of an appropriate linear/nonlinear data-fitting function. Despite its novelty and attractive properties, the GP formalism has not been explored widely for data-driven modeling applications in chemical and biochemical sciences/engineering to the same extent as other exclusively data-driven modeling methods, namely ANNs and SVR. In one of the soft-sensor applications involving the GP formalism, Kordon et al. [14] developed a soft-sensor for the emission estimation in one of the Dow Chemical Company plants in Freeport, TX. In this study, the soft-sensor was developed by integrating three computational intelligence approaches, namely, GP, analytical neural networks, and support vector machines. A rigorous literature survey indicates that the present study is the first one, wherein the GP formalism has been utilized for the development of soft-sensors for biochemical processes. The efficacy of the GP-based soft-sensors for biochemical processes has been demonstrated by conducting two case studies involving microorganism assisted extracellular production of lipase and production of bacterial poly(3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer. In these case studies, multiple input–single output (MISO) example data sets have been utilized in searching and optimizing the functional form (structure) as also parameters of the MISO data-fitting functions (soft-sensors). While in the first case study, the soft-sensor predicts the time-dependent lipase activity (U/ml), in the second case study it predicts the amount of accumulated polyhydroxyalkanoates (% dcw). The prediction accuracy and generalization capability of the GP-based soft-sensors have been compared with those developed using the ANN and SVR formalisms.

This paper is structured as follows. Section 2 provides a detailed description of the GP formalism and its implementation. The commonly used feed-forward artificial neural network, namely multilayer perceptron (MLP) and the machine learning based SVR formalism have been described in sections three and four, respectively. The two case studies wherein the GP-based soft-sensor models have been developed for two biochemical systems are presented in Section 5. This section also provides results of the comparison of the GP, MLP and SVR based soft-sensor models pertaining to the two biochemical systems. Finally, Section 6 summarizes the principal findings of the study.

Section snippets

Genetic programming (GP)

In its original form, the GP formalism was proposed as a method for automatically generating computer programs that perform predefined tasks [13]. It is an extension of the Genetic algorithm (GA) formalism [15]. Given an objective function, the GA efficiently searches and optimizes the values of the decision variables that would maximize or minimize the function. Similar to the GA, the GP is founded on the Darwinian principles of natural selection and reproduction. Accordingly, the GP

Artificial neural networks

The basic MLP structure portrayed in Fig. 6 is composed of three layers, namely input, hidden and output layers consisting of N, M and L processing elements (also termed “nodes” or “neurons”), respectively (where L = 1). Given the data set D, containing Np measurements of the input (independent/causal/predictor)–output (dependent/response) variables, the MLP learns the nonlinear interrelationships existing between them by appropriately adjusting the inter-node connection weights. The objective of

Support vector regression (SVR)

In recent years, support vector regression (SVR) [35], [36] has gained a widespread acceptance in the construction of data-driven nonlinear models. It is an adaptation of the statistical/machine learning theory based classification paradigm namely, support vector machines [35]. The SVR possesses some desirable characteristics such as, good generalization ability of the regression function, robustness of the solution, sparseness of the regression and an automatic control of the solution

Case studies

This section provides details of the development of GP-based soft-sensor models for two biochemical processes, namely microorganism assisted extracellular production of lipase and production of a bacterial copolymer. While in case study-I, the soft-sensor predicts the time-dependent lipase activity (U/ml), in case study-II the soft-sensor predicts the amount of accumulated polyhydroxyalkanoates (% dcw). The prediction performance of the GP-based soft-sensors has been compared with those

Conclusion

In the present paper, a novel and exclusively data-driven AI-based modeling formalism, namely genetic programming has been introduced for the development of soft-sensors for the bioprocesses. The efficacy of the GP-based soft-sensors has been successfully demonstrated by conducting two case studies involving production of the lipase enzyme and bacterial poly (3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer, respectively. In these case studies, the soft-sensors predict the time-dependent

References (59)

  • P. Barmpalexis et al.

    Symbolic regression via genetic programming in the optimization of a controlled release pharmaceutical formulation

    Chemom. Intell. Lab. Syst.

    (2011)
  • M.E. Gunay et al.

    Simultaneous modelling of enzyme production and biomass growth in recombinant Escherichia coli using artificial neural networks

    Biochem. Eng. J.

    (2008)
  • R. Oliveira

    Combining first principles modelling and artificial neural networks: a general framework

    Comput. Chem. Eng.

    (2004)
  • Y.H. Zhu et al.

    Application of neural network to lysine production

    Biochem. Eng. J.

    (1996)
  • G.A. Moreira et al.

    Optimization of the Bacillus thuringiensis var. kurstaki HD-1 δ-endotoxins production by using experimental mixture design and artificial neural networks

    Biochem. Eng. J.

    (2007)
  • J.C.B. Gonzaga et al.

    ANN-based soft-sensor for real-time process monitoring and control of an industrial polymerization process

    Comput. Chem. Eng.

    (2009)
  • S. Nandi et al.

    Hybrid process modelling and optimization strategies integrating neural networks/support vector regression and genetic algorithms: study of benzene isopropylation of Hbeta catalyst

    Chem. Eng. J.

    (2004)
  • L. Nini et al.

    Lipase catalysed hydrolysis of short-chain substrates in solution and in emulsion: a kinetic study

    Biochim. Biophys. Acta

    (2001)
  • B. Joseph et al.

    Cold active microbial lipases: some hot issues and recent developments

    Biotechnol. Adv.

    (2008)
  • J.F.M. Burkert et al.

    Optimization of extracellular lipase production by Geotrichum sp. using factorial design

    Bioresour. Technol.

    (2004)
  • N.F. Starodub

    Biosensors for the evaluation of lipase activity

    J. Mol. Catal. B: Enzym.

    (2006)
  • A. Steinbuchel

    Biodegradable plastics

    Curr. Opin. Biotechnol.

    (1992)
  • L. Lama et al.

    Effect of growth condition on endo- and exopolymer biosynthesis in Anabaena cylindrica 10C

    Phytochemistry

    (1996)
  • N. Mallick et al.

    Process optimization for poly(3-hydroxybutyrate-co-3-hydroxyvalerate) co-polymer production by Nostoc muscorum

    Biochem. Eng. J.

    (2007)
  • L.A.C. Meleiro et al.

    State and parameter estimation based on a nonlinear filter applied to an industrial process control of ethanol production

    Braz. J. Chem. Eng.

    (2000)
  • Q. Yang

    Staged soft-sensor modelling for batch fermentation process

  • P. Wechselberger et al.

    Real-time estimation of biomass and specific growth rate in physiologically variable recombinant fed-batch processes

    Bioprocess Biosyst. Eng.

    (2013)
  • D. Dochain et al.

    Dynamical modelling, analysis, monitoring and control design for nonlinear bioprocesses

    Adv. Biochem. Eng. Biotechnol.

    (1997)
  • C.M. Bo et al.

    The application of neural network soft sensor technology to an advanced control system of distillation operation

  • Cited by (38)

    • A novel elemental composition based prediction model for biochar aromaticity derived from machine learning

      2021, Artificial Intelligence in Agriculture
      Citation Excerpt :

      More importantly, GP is capable of automatically arriving at an optimized mathematic model without making any assumptions regarding the structure and parameters of the developing model. The most attractive feature of GP algorithm is that, depending on the nature of dependencies (whether linear or nonlinear) in the developing data (experimental data), the technique by itself can choose a suitable model that optimally fits the developing data based on Darwinian theory of natural selection (Faris and Sheta, 2013; Pandey et al., 2015; Sharma and Tambe, 2014). During the modification of the Mazumdar model by considering structure information of heteroatoms (e.g., H, O, and N), there were introduced several ideal assumptions.

    • A general correlation for the frictional pressure drop during condensation in mini/micro and macro channels

      2020, International Journal of Heat and Mass Transfer
      Citation Excerpt :

      In chemistry and chemical engineering, GP has been used for several modeling applications. For example, Sharma and Tambe used GP to consider the higher heating value in biomass fuels [16], treatment of oily wastewaters [17], condensation heat transfer coefficient [18] and assessment of soil liquefaction [19]. For a better understanding of GP implementation, the symbolic regression is described.

    • Two degree of freedom PID based inferential control of continuous bioreactor for ethanol production

      2017, ISA Transactions
      Citation Excerpt :

      Results reveal that NIRS method produces best model during validation of new data. Sharma and Tambe [12] developed a soft sensor based on genetic programming (GP) extracellular production of lipase enzyme and bacterial production of 3-hydroxybutyrate-co-3 hydroxyvalerate. Performance of GP based soft sensor is then compared with MLP and SVR soft sensors.

    View all citing articles on Scopus
    View full text