Elsevier

Biosystems

Volume 76, Issues 1–3, August–October 2004, Pages 217-227
Biosystems

Adapting genetic regulatory models by genetic programming

https://doi.org/10.1016/j.biosystems.2004.05.014Get rights and content

Abstract

In this paper, we focus on the task of adapting genetic regulatory models based on gene expression data from microarrays. Our approach aims at automatic revision of qualitative regulatory models to improve their fit to expression data. We describe a type of regulatory model designed for this purpose, a method for predicting the quality of such models, and a method for adapting the models by means of genetic programming. We also report experimental results highlighting the ability of the methods to infer models on a number of artificial data sets. In closing, we contrast our results with those of alternative methods, after which we give some suggestions for future work.

Introduction

Bioinformatics is a field driven by the rapid accumulation of molecular biology data. Until recently, the flood of data becoming available mainly consisted of DNA and amino acid sequences. With the rapid advances made in sequencing technology, it became routine to sequence genes, and feasible to sequence even entire genomes. For many years, therefore, bioinformatics was focused on dealing with these sequence collections, and developing computer algorithms for tasks such as finding coding regions and exons in DNA, analysing evolutionary relationships by aligning sequences and identifying similarities, trying to predict protein 2D and 3D structure from amino acid sequence data, etc.

Today, however, bioinformatics has undergone a rapid, dramatic, and fundamental change of focus. The reason for this is that microarray technology, popularly referred to as “gene chips”, has become a mature technology, and it has become routine for molecular biologists to collect expression data for thousands of genes under varying conditions. Studying changes in the expression levels of genes in response to environmental change, medication, exposure to toxins, or other stimuli, has rapidly become one of the standard techniques for gaining insight into the function of the proteins encoded by these genes. In addition, microarray technology also opens up the possibility of understanding not only which genes are involved in the response to particular stimuli, but also the networks involved in regulating the expression of these genes.

This paper will focus on one of the exciting possibilities opened up by the advent of microarray technology, namely to utilize the availability of gene expression data to infer regulatory relationships between genes. If a gene has a regulatory impact on another gene, we can reasonably assume that—at least in some cases—this should be detectable from the expression data. It is now, therefore, of urgent interest to explore the possibility of developing methods for inferring large networks of gene interactions from gene expression data.

Kohane et al. (2003) points out that standard statistical techniques for elucidating relationships between multiple variables do not hold up well when applied to gene expression data sets because of their underdetermined nature. Such data sets contain measurements of very high dimensionality (on the order of thousands of variables) but only for a small number of cases (on the order of tens to hundreds), which means that multiple models fit the data equally well and that additional knowledge of the learning domain is required to resolve the ambiguities. The modeling of genomic data sets therefore requires new approaches.

Due to the underdetermined nature of expression data the approach of inferring regulatory models without any bias towards plausible models is often not applicable to real world data. Also, the size of the networks that currently can be inferred by such techniques seems far too small. We think that these issues call for an increased use of expert knowledge in the discovery of regulatory models as well as a preference for qualitative models over quantitative ones. The approach we have adopted aims at achieving that in an interactive environment that will let experts repetitively state qualitative regulatory models, evaluate how the models fit the expression data, specify constraints on the search for revised models, search for revised models, and select revisions that they find plausible. Similar approaches have also been reported in Iba and Mimura (2002), and Shrager et al. (2002).

Section snippets

Modeling gene regulation

In selecting the type of regulatory model to fit to the expression data we conclude that qualitative models would be an appropriate choice for the reasons outlined above. Our regulatory models are qualitative in that they only specify directions of influence in a non-recurrent network of genes. Since the real biological networks that we model are believed to be highly recurrent at the lowest level of abstraction, our models only aim to explain highly abstract properties of those networks. In

Optimizing regulatory models

The method we have chosen to adapt the regulatory models uses an evolutionary algorithm (EA) to improve the models according to a quality measure. This allows for experts to revise one or multiple working models through the seeding of the population. Domain knowledge can also be incorporated in the design of the evaluation function, representation, and variation operators.

All EAs require a fitness function as a solution quality measure. In our case the quality of a solution is given by its

GP design issues

The GP system we used is based on ECJ 9 (Luke, 2002). Unless explicitly stated below, methods and parameters of the system are those that come by default in ECJ 9.

Experiments

We conducted a series of experiments to evaluate our methods. Although the methods were designed to allow for experts to revise working models through the seeding of the initial population, we decided to evaluate their ability to infer regulatory models without being provided with such models before evaluating their model revising ability. The initial population was therefore initiated with small random programs, which is common practice in GP. For the ease of evaluation, models were fitted to

Results

In the first experiment we tried to infer a 10 gene network. Fig. 4 shows the best and average fitness values of the population for 100 generations averaged over 10 runs. The error bars show the area in which the average is located with 95% confidence assuming a t-distribution. In this experiment an individual whose network fits the data perfectly would receive a fitness value of 10. In the most successful run, such an individual was found in generation 19. Fig. 5 shows that the target and best

Discussion

The best solutions found in the experiments had fitness values amounting to 100, 92, 80, 71, and 58% of the optimal values, in inferring networks with 10, 20, 40, 80, and 160 genes, respectively. To see how our methods inference capabilities scale with the number of genes of the target networks, a more reliable measure is the percentage of the averaged best fitness in the final generation compared to the fitness of a perfect individual. Applying this measure to our results yields the values

References (19)

  • H. Iba et al.

    Inference of a gene regulatory network by means of interactive evolutionary computing

    Inf. Sci.

    (2002)
  • T. Akutsu et al.

    Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

    Pac. Symp. Biocomput.

    (1999)
  • Ando, S., Iba, H., 2001. Inference of gene regulatory model by genetic algorithms. In: Proceedings of the 2001 IEEE...
  • M. Arnone et al.

    The hardwiring of development: organization and function of genomic regulatory systems

    Development

    (1997)
  • Banzhaf, W., Nordin, P., Keller, R., Francone, F., 1998. Genetic Programming: An Introduction. Morgan Kaufmann...
  • N. Friedman et al.

    Using Bayesian networks to analyze expression data

    J. Comput. Biol.

    (2000)
  • Gruau, F., 1992. Genetic synthesis of Boolean neural networks with a cell rewriting developmental process. In: Whitley,...
  • D. Hoyle et al.

    Making sense of microarray data distributions

    Bioinformatics

    (2002)
  • Kohane, I.S., Kho, A.T., Butte, A.J., 2003. Microarrays for an Integrative Genomics. MIT...
There are more references available in the full text version of this article.

Cited by (19)

  • Inferring gene regulatory networks with hybrid of multi-agent genetic algorithm and random forests based on fuzzy cognitive maps

    2018, Applied Soft Computing Journal
    Citation Excerpt :

    For example, Ramteke et al. [37] used a real-coded genetic algorithm (GA) to enhance the performance of genetic algorithm, which was termed as simulated binary jumping gene. Eriksson et al. [10] proposed genetic programming for inferring discrete GRNs. Chao et al. [7] used GA to search feed forward regulatory genes, which was based on the recurrent neural network model.

  • Constructing gene regulatory networks from microarray data using GA/PSO with DTW

    2012, Applied Soft Computing Journal
    Citation Excerpt :

    For example, Chan et al. used three computational intelligence methods including least angle regression (LARS), expectation maximization (EM) with Kalman filter (KF) and evolving fuzzy neural network (EFuNN) [15] to infer GRNs. Eriksson and Olsson inferred GRNs using genetic programming [16]. Tian proposed a stochastic model which is based on noise of the microarray experiments to predict GRNs [17].

View all citing articles on Scopus
View full text