Genetic programming as an analytical tool for non-linear dielectric spectroscopy

https://doi.org/10.1016/S0302-4598(99)00022-7Get rights and content

Abstract

By modelling the non-linear effects of membranous enzymes on an applied oscillating electromagnetic field using supervised multivariate analysis methods, Non-Linear Dielectric Spectroscopy (NLDS) has previously been shown to produce quantitative information that is indicative of the metabolic state of various organisms. The use of Genetic Programming (GP) for the multivariate analysis of NLDS data recorded from yeast fermentations is discussed, and GPs are compared with previous results using Partial Least Squares (PLS) and Artificial Neural Nets (NN). GP considerably outperforms these methods, both in terms of the precision of the predictions and their interpretability.

Introduction

When a suspension of cells is exposed to a static electric field, or to an alternating electric field whose frequency is low relative to that of the classical β-dielectric dispersion, it does not penetrate to the interior of the cell, and is dropped almost entirely across the outer membrane of the cell, which is predominantly capacitive at these frequencies, and, due to its thinness, causes a substantial amplification of the field across the membrane (e.g., Refs. 1, 2, 3). In consequence, anything internal to the cell is essentially electrically invisible to a low frequency electric field, but anything dielectrically active in the membrane may be expected to display properties associated with fields far stronger than that applied externally.

The dielectric response of biological tissue has long been assumed linear when the macroscopic exciting field is low, say <0.1 V cm−1 as used typically; however, substantial non-linear phenomena in the form of harmonics of the fundamental are in fact produced for reasons discussed in Refs. 4, 5, leading to the use of non-linear spectroscopy on the dielectric properties of the membranous enzymes actually to indicate and/or influence the metabolic state of cell suspensions 5, 6, 7, 8, 9, 10, 11.

Inhibitor and other studies indicated that, in yeast, the non-linear dielectric signal is due mainly to the H+–ATPase located in the cells' plasma membrane 5, 9and hence NLDS may be used to quantify the use of glucose by yeast cells [11].

Genetic programming 12, 13is an evolutionary technique which uses the concepts of Darwinian selection to generate and optimise a desired computational function or mathematical expression. It has been comprehensively studied theoretically over the past few years, but applications to real laboratory data as a practical modelling tool are still rather rare 14, 15, 16, 17, 18, 19, 20.

The thrust of this paper is to compare the results of modelling the data in Ref. [11]using Genetic Programming (using a program written in-house by RJG [18]) with those previously presented which were analysed using Partial Least Squares and Artificial Neural Nets. To summarise, the modelling of these data was found to require the non-linear modelling abilities of NN, the linear nature of PLS being unable to accurately approximate the data. GP can also model non-linear data, but an additional advantage over NN is that the latter is a `black-box' method in that it tells the user very little about the underlying processes involved in the effect under study, whereas GP generates explicit equations which may be interpretable in respect of the causation of the studied effect. While these equations are still complex in NLDS modelling and their simplification for interpretation is left for future work, this paper concentrates on the modelling precision of GP in comparison to NN when applied to difficult data such as NLDS spectrograms. GP can also model variations that require the interaction of several measured variables, rather than requiring that these variables be orthogonal.

An initial population of individuals, each encoding a potential solution to the optimisation problem, is generated randomly and their ability to reproduce the desired output is assessed. New individuals are generated either by mutation (the introduction of one or more random changes to a single parent individual) or by crossover (randomly re-arranging functional components between two or more parent individuals). The fitness of the new individuals is then assessed, and the fitter individuals from the total population are more likely to become the parents of the next generation. This process is repeated until either the desired result is achieved or the rate of improvement in the population becomes zero. It has been shown [12]that if the parent individuals are chosen according to their fitness values, the genetic method can approach the theoretical optimum efficiency for a search algorithm.

Section snippets

Data recording

The data sets used in Ref. [11]were used in this study to provide a complete comparison with that previous analysis. They comprise two data sets, collected during simple batch fermentations, with parallel measurements of glucose levels with NLDS vs. a reference method (Reflolux hand-held blood glucose meter). Fermentation 1 contains 47 samples of 150 harmonic variables, and Fermentation 2 contains 49 similar samples collected in a similar fermentation on a separate day.

Each NLDS spectrum-sweep

Method

In order to implement a genetic optimisation of a predictive model, it is necessary to formulate the model in a notation that is amenable to mutation and crossover. Attempting a genetic optimisation using a model formulated either in standard mathematical notation or computer program code will result, in all likelihood, in the generation of non-functional individuals. To overcome this, the genetic program method uses the concept of a function tree, comprising nodes and terminals [12].

A terminal

Results of GP modelling

The data in Fermentation 1 are sectioned into odd and even samples: the odd 24 samples forming the training set and the even 23 samples, the validation set. The GP model formed on these was used to predict the 49 unseen samples of Fermentation 2 comprising the test set. Note that the validation and training sets are taken from one fermentation and the test set from another, entirely separate fermentation. In a situation such as NLDS, where instrument response can vary considerably, this means

Conclusions

NLDS data provide a rigorous test-bed for multivariate modelling methods, having a small signal variation hidden in large uncorrelated instrumental fluctuations. The resulting error surface is very noisy and the global minimum appears to be very localised, requiring a very efficient search strategy to be used by the modelling process. The relationship between measured variables and the reference variable is also non-linear, restricting the choice of modelling methods. Neural nets achieve

Acknowledgements

AMW and DBK thank the Wellcome Trust, under the terms of the Sir Henry Wellcome SHoWCASe Award scheme, and RJG and DBK thank the UK EPSRC, for financial suppport.

References (29)

Cited by (19)

  • Maximization of extraction of Cadmium and Zinc during recycling of spent battery mix: An application of combined genetic programming and simulated annealing approach

    2019, Journal of Cleaner Production
    Citation Excerpt :

    Predictive modelling methods based on Artificial intelligence (AI) seems a better alternative. Among AI methods, evolutionary approach of genetic programming (GP) has the ability to automate the model structure and coefficients estimation resulting in the evolution of the best model (Woodward et al., 1999). The GP model has a free non-linear form that has the best fits.

  • Fast nonlinear region localisation for nonlinear dielectric spectroscopy of biological suspensions

    2013, Biosensors and Bioelectronics
    Citation Excerpt :

    The difference between the frequency spectra of the two measurements was adopted as the nonlinear contribution of biological material. Despite the numerous attempts, the nonlinear component of EEI could not be avoided (Woodward et al., 1996, 1999, 2000). Others investigators also failed to eliminate the nonlinearity of the EEI despite using high technology systems (Nawarathna et al., 2005a, 2005b, 2006).

  • Prediction of PM<inf>10</inf> concentrations through multi-gene genetic programming

    2010, Atmospheric Pollution Research
    Citation Excerpt :

    Two case studies were presented and, in both cases, GP and neural networks presented similar errors. Other studies were also presented (Woodward et al., 1999; Tang and Li, 2002), showing the capacity of GP to achieve an adequate model for nonlinear relationships between variables. The present study aims to evaluate the performance of MGP for predicting the daily average PM10 concentrations.

  • Obtaining transparent models of chaotic systems with multi-objective simulated annealing algorithms

    2008, Information Sciences
    Citation Excerpt :

    Our aim is to discover a consistent subset of state variables and the equations that relate them, and also the numerical values of the coefficients in these equations. Some of the most recent approaches to obtain this information are based on evolutionary techniques, combined with a tree-based representation of the model [4,6,14,19,46,56]. In Fig. 3, there is a simplified example of such a representation, that will be explained in depth in Section 3.2.

  • Non-linear Methods for the Analysis of Metabolic Profiles

    2007, The Handbook of Metabonomics and Metabolomics
View all citing articles on Scopus
View full text