Adapting genetic regulatory models by genetic programming

doi:10.1016/j.biosystems.2004.05.014

Biosystems

Volume 76, Issues 1–3, August–October 2004, Pages 217-227

https://doi.org/10.1016/j.biosystems.2004.05.014 Get rights and content

Abstract

In this paper, we focus on the task of adapting genetic regulatory models based on gene expression data from microarrays. Our approach aims at automatic revision of qualitative regulatory models to improve their fit to expression data. We describe a type of regulatory model designed for this purpose, a method for predicting the quality of such models, and a method for adapting the models by means of genetic programming. We also report experimental results highlighting the ability of the methods to infer models on a number of artificial data sets. In closing, we contrast our results with those of alternative methods, after which we give some suggestions for future work.

Introduction

Bioinformatics is a field driven by the rapid accumulation of molecular biology data. Until recently, the flood of data becoming available mainly consisted of DNA and amino acid sequences. With the rapid advances made in sequencing technology, it became routine to sequence genes, and feasible to sequence even entire genomes. For many years, therefore, bioinformatics was focused on dealing with these sequence collections, and developing computer algorithms for tasks such as finding coding regions and exons in DNA, analysing evolutionary relationships by aligning sequences and identifying similarities, trying to predict protein 2D and 3D structure from amino acid sequence data, etc.

Today, however, bioinformatics has undergone a rapid, dramatic, and fundamental change of focus. The reason for this is that microarray technology, popularly referred to as “gene chips”, has become a mature technology, and it has become routine for molecular biologists to collect expression data for thousands of genes under varying conditions. Studying changes in the expression levels of genes in response to environmental change, medication, exposure to toxins, or other stimuli, has rapidly become one of the standard techniques for gaining insight into the function of the proteins encoded by these genes. In addition, microarray technology also opens up the possibility of understanding not only which genes are involved in the response to particular stimuli, but also the networks involved in regulating the expression of these genes.

This paper will focus on one of the exciting possibilities opened up by the advent of microarray technology, namely to utilize the availability of gene expression data to infer regulatory relationships between genes. If a gene has a regulatory impact on another gene, we can reasonably assume that—at least in some cases—this should be detectable from the expression data. It is now, therefore, of urgent interest to explore the possibility of developing methods for inferring large networks of gene interactions from gene expression data.

Kohane et al. (2003) points out that standard statistical techniques for elucidating relationships between multiple variables do not hold up well when applied to gene expression data sets because of their underdetermined nature. Such data sets contain measurements of very high dimensionality (on the order of thousands of variables) but only for a small number of cases (on the order of tens to hundreds), which means that multiple models fit the data equally well and that additional knowledge of the learning domain is required to resolve the ambiguities. The modeling of genomic data sets therefore requires new approaches.

Due to the underdetermined nature of expression data the approach of inferring regulatory models without any bias towards plausible models is often not applicable to real world data. Also, the size of the networks that currently can be inferred by such techniques seems far too small. We think that these issues call for an increased use of expert knowledge in the discovery of regulatory models as well as a preference for qualitative models over quantitative ones. The approach we have adopted aims at achieving that in an interactive environment that will let experts repetitively state qualitative regulatory models, evaluate how the models fit the expression data, specify constraints on the search for revised models, search for revised models, and select revisions that they find plausible. Similar approaches have also been reported in Iba and Mimura (2002), and Shrager et al. (2002).

Section snippets

Modeling gene regulation

In selecting the type of regulatory model to fit to the expression data we conclude that qualitative models would be an appropriate choice for the reasons outlined above. Our regulatory models are qualitative in that they only specify directions of influence in a non-recurrent network of genes. Since the real biological networks that we model are believed to be highly recurrent at the lowest level of abstraction, our models only aim to explain highly abstract properties of those networks. In

Optimizing regulatory models

The method we have chosen to adapt the regulatory models uses an evolutionary algorithm (EA) to improve the models according to a quality measure. This allows for experts to revise one or multiple working models through the seeding of the population. Domain knowledge can also be incorporated in the design of the evaluation function, representation, and variation operators.

All EAs require a fitness function as a solution quality measure. In our case the quality of a solution is given by its

GP design issues

The GP system we used is based on ECJ 9 (Luke, 2002). Unless explicitly stated below, methods and parameters of the system are those that come by default in ECJ 9.

Experiments

We conducted a series of experiments to evaluate our methods. Although the methods were designed to allow for experts to revise working models through the seeding of the initial population, we decided to evaluate their ability to infer regulatory models without being provided with such models before evaluating their model revising ability. The initial population was therefore initiated with small random programs, which is common practice in GP. For the ease of evaluation, models were fitted to

Results

In the first experiment we tried to infer a 10 gene network. Fig. 4 shows the best and average fitness values of the population for 100 generations averaged over 10 runs. The error bars show the area in which the average is located with 95% confidence assuming a t-distribution. In this experiment an individual whose network fits the data perfectly would receive a fitness value of 10. In the most successful run, such an individual was found in generation 19. Fig. 5 shows that the target and best

Discussion

The best solutions found in the experiments had fitness values amounting to 100, 92, 80, 71, and 58% of the optimal values, in inferring networks with 10, 20, 40, 80, and 160 genes, respectively. To see how our methods inference capabilities scale with the number of genes of the target networks, a more reliable measure is the percentage of the averaged best fitness in the final generation compared to the fitness of a perfect individual. Applying this measure to our results yields the values

References (19)

H. Iba et al.
Inference of a gene regulatory network by means of interactive evolutionary computing
Inf. Sci.
(2002)
T. Akutsu et al.
Identification of genetic networks from a small number of gene expression patterns under the Boolean network model
Pac. Symp. Biocomput.
(1999)
Ando, S., Iba, H., 2001. Inference of gene regulatory model by genetic algorithms. In: Proceedings of the 2001 IEEE...
M. Arnone et al.
The hardwiring of development: organization and function of genomic regulatory systems
Development
(1997)
Banzhaf, W., Nordin, P., Keller, R., Francone, F., 1998. Genetic Programming: An Introduction. Morgan Kaufmann...
N. Friedman et al.
Using Bayesian networks to analyze expression data
J. Comput. Biol.
(2000)
Gruau, F., 1992. Genetic synthesis of Boolean neural networks with a cell rewriting developmental process. In: Whitley,...
D. Hoyle et al.
Making sense of microarray data distributions
Bioinformatics
(2002)
Kohane, I.S., Kho, A.T., Butte, A.J., 2003. Microarrays for an Integrative Genomics. MIT...

There are more references available in the full text version of this article.

Cited by (19)

Inferring gene regulatory networks with hybrid of multi-agent genetic algorithm and random forests based on fuzzy cognitive maps
2018, Applied Soft Computing Journal
Citation Excerpt :
For example, Ramteke et al. [37] used a real-coded genetic algorithm (GA) to enhance the performance of genetic algorithm, which was termed as simulated binary jumping gene. Eriksson et al. [10] proposed genetic programming for inferring discrete GRNs. Chao et al. [7] used GA to search feed forward regulatory genes, which was based on the recurrent neural network model.
Inferring gene regulatory networks (GRNs) from expression data is an important and challenging problem in the field of computational biology. With the growth of high-throughput gene expression data, GRN inference has attracted much interest from researchers. In this paper, we focus on inferring large-scale GRNs using a fast and accurate algorithm. We first use fuzzy cognitive maps (FCMs) to model GRNs. Then, multi-agent genetic algorithm (MAGA) is used to determine regulatory links, and random forests (RF) are used as the feature selection algorithm to initialize the agents, which can reduce the search space of MAGA according to the gene ranking. We improve the genetic operators of MAGA to cope with GRN inference. The proposed algorithm is termed as MAGARF_FCM-GRN. In the experiments, the performance of MAGARF_FCM-GRN is validated on synthetic data and the well-known benchmark DREAM3 and DREAM4. The results show that MAGARF_FCM-GRN can infer directed GRNs with high accuracy and efficiency.
Constructing gene regulatory networks from microarray data using GA/PSO with DTW
2012, Applied Soft Computing Journal
Citation Excerpt :
For example, Chan et al. used three computational intelligence methods including least angle regression (LARS), expectation maximization (EM) with Kalman filter (KF) and evolving fuzzy neural network (EFuNN) [15] to infer GRNs. Eriksson and Olsson inferred GRNs using genetic programming [16]. Tian proposed a stochastic model which is based on noise of the microarray experiments to predict GRNs [17].
Recently, many methods have been proposed for constructing gene regulatory networks (GRNs). However, most of the existing methods ignored the time delay regulatory relation in the GRN predictions. In this paper, we propose a hybrid method, termed GA/PSO with DTW, to construct GRNs from microarray datasets. The proposed method uses test of correlation coefficient and the dynamic time warping (DTW) algorithm to determine the existence of a time delay relation between two genes. In addition, it uses the particle swarm optimization (PSO) to find thresholds for discretizing the microarray dataset. Based on the discretized microarray dataset and the predicted types of regulatory relations among genes, the proposed method uses a genetic algorithm to generate a set of candidate GRNs from which the predicted GRN is constructed. Three real-life sub-networks of yeast are used to verify the performance of the proposed method. The experimental results show that the GA/PSO with DTW is better than the other existing methods in terms of predicting sensitivity and specificity.
Stochastic models for inferring genetic regulation from microarray gene expression data
2010, BioSystems
Microarray expression profiles are inherently noisy and many different sources of variation exist in microarray experiments. It is still a significant challenge to develop stochastic models to realize noise in microarray expression profiles, which has profound influence on the reverse engineering of genetic regulation. Using the target genes of the tumour suppressor gene p53 as the test problem, we developed stochastic differential equation models and established the relationship between the noise strength of stochastic models and parameters of an error model for describing the distribution of the microarray measurements. Numerical results indicate that the simulated variance from stochastic models with a stochastic degradation process can be represented by a monomial in terms of the hybridization intensity and the order of the monomial depends on the type of stochastic process. The developed stochastic models with multiple stochastic processes generated simulations whose variance is consistent with the prediction of the error model. This work also established a general method to develop stochastic models from experimental information.
Soya protein-and casein-based nutritionally complete diets fed during gestation and lactation differ in effects on characteristics of the metabolic syndrome in male offspring of Wistar rats
2012, British Journal of Nutrition
An improved transient search optimization with neighborhood dimensional learning for global optimization problems
2021, Symmetry
Reconstructing gene regulatory networks via memetic algorithm and LASSO based on recurrent neural networks
2020, Soft Computing

View all citing articles on Scopus

View full text

Adapting genetic regulatory models by genetic programming

Abstract

Introduction

Section snippets

Modeling gene regulation

Optimizing regulatory models

GP design issues

Experiments

Results

Discussion

Inf. Sci.

Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

Pac. Symp. Biocomput.

The hardwiring of development: organization and function of genomic regulatory systems

Development

Using Bayesian networks to analyze expression data

J. Comput. Biol.

Making sense of microarray data distributions

Bioinformatics