Elsevier

Fluid Phase Equilibria

Volume 329, 15 September 2012, Pages 71-77
Fluid Phase Equilibria

Gene expression programming strategy for estimation of flash point temperature of non-electrolyte organic compounds

https://doi.org/10.1016/j.fluid.2012.05.015Get rights and content

Abstract

The accuracy and predictability of correlations and models to determine the flammability characteristics of chemical compounds are of drastic significance in various chemical industries. In the present study, the main focus is on introducing and applying the gene expression programming (GEP) mathematical strategy to develop a comprehensive empirical method for this purpose. This work deals with presenting an empirical correlation to predict the flash point temperature of 1471 (non-electrolyte) organic compounds from 77 different chemical families. The parameters of the correlation include the molecular weight, critical temperature, critical pressure, acentric factor, and normal boiling point of the compounds. The obtained statistical parameters including root mean square of error of the results from DIPPR 801 data (8.8, 8.9, 8.9 K for training, optimization and prediction sets, respectively) demonstrate improved accuracy of the results of the presented correlation with respect to previously-proposed methods available in open literature.

Highlights

► The novel gene expression programming strategy as a powerful modeling tool is introduced. ► A corresponding state model for flash point temperature is presented using the strategy. ► Statistical parameters demonstrate that the model results are in good agreement with experimental data.

Introduction

The term flash point (FP) refers to the lowest temperature at which a liquid gives off sufficient vapor to form an ignitable mixture with air near the surface of the liquid or within the vessel used [1]. FP values are essential information for the safe transportation, storage, and use of combustible liquids [2], [3], [4], [5].

Experimental measurement of FP is expensive and may contain high practical uncertainties. Therefore, calculation of FP of various compounds has been the subject of many theoretical studies in order to develop accurate models. These investigations can be classified into three main categories: “empirical correlations”, “quantitative structure-property relationship (QSPR)” models, and the well-known “group contribution” methods. It should be noted that the latter is a special form of QSPRs; however, they are considered as a different class due to their easy to use nature and wide range of applications. Good reviews are available in the literature for various methods proposed for FP [6], [7], [8].

The first category contains those correlations that need at least one of the other physical properties such as normal boiling point, density, vapor pressure, critical properties, and enthalpy of vaporization [8]. To refer some of those models that lie in this class, we can refer to those empirical correlations proposed by Prugh (200 compounds, average absolute error 11 K, and maximum deviation 500%) [9], Fuji and Herman (168 compounds, for 89% of compounds within ±10 K) [10], Patil (950 compounds, several models with AARD% of 10%) [11], Suzuki et al. (400 compounds, average absolute error 13.52 K) [12], Satyrayana and Kakati (250 compounds, AARD% of 8.3%, maximum deviation 32.72%) [13], Satyrayana and Rao (1221 compounds, several correlations) [14], Metcalfe and Metcalfe (201 compounds, average absolute error 8.6 K, maximum error 26.2 K) [15], Hshieh (207 compounds, average absolute error 11.06 K) [16], Catoire and Naudet (evaluated using 1471 compounds: [4] AARD of 2.44% and average absolute error of 8.28 K) [7] and Gharagheizi et al. (the former model: AARD of 2.4% and average absolute error of 8.06 K; the latter model: AARD of 2.14%) [4], [5].

The second category is the QSPR models in which FP is correlated using some chemical structure-based parameters called “molecular descriptors”. These correlations just relate the FP to the chemical structure and do not need any other physical properties. We can refer to the QSPR models presented by Tetteh et al. (400 compounds, average absolute error of approximately 11 K) [17], Katritzky et al. (the former: 271 and compounds, root mean square error of 23.03 K; the latter: 758 compounds, AARD of 3.49% and average absolute error of 10.65 K) [18], [19], Gharagheizi and Alamdari (1378 compounds, AARD of 10.2%) [2]. There are numerous studies in this category; however, the general models are regarded in this study. It is obvious that developing this type of correlations is much more difficult than the ones for particular chemical families such as hydrocarbons. The most important drawback of the QSPR models is the complex procedure of calculation of the molecular descriptors from chemical structure. As a result, these correlations are not simple to use.

The third category includes the group contribution models (GC). In this kind of methods, FP is correlated with the number of occurrences of some chemical substructures. It seems the only general model from this class is the one proposed by the first author and his co-workers (1030 compounds, AARD of 10.2%) [3]. Of course, there are some other GC methods that proposed just for some particular chemical families. They are not considered in this study. Perhaps, the only important weak point of the GC models is the large number of parameters that they require. In addition, in the recently proposed version of GC, namely, artificial neural network-group contribution (ANN-GC), the complexity of the model is another issue.

A comprehensive comparison between these three categories is pretty difficult because there are several factors to be taken into account, for instance, simplicity of the model, accuracy of the model, simplicity of the parameters, and the comprehensiveness of the method for covering the wider applicability domain. The latter includes both the number of compounds, and the diversity of chemical compounds employed while developing and validating the model.

According to the reported statistical parameters of the models, the first category seems to be more convincing than others due to the simplicity basis, accuracy and comprehensiveness.

As a result of statistical parameters of the models, it can be concluded that despite significant progress in the estimation of FP using the QSPR and GC methods, the empirical correlations give more comprehensive and more accurate results. The latter can normally give acceptable results within the range of the conditions and the compounds, implemented for their development. Semi-empirical correlations use some theoretical basis in the form of parameters to improve the prediction capabilities.

In any case, certain parameters of the aforementioned correlations should be regressed over the experimental data. Many mathematical methods, including linear/nonlinear regression methods and various kinds of optimization techniques have been so far proposed for this purpose.

The genetic algorithm (GA), firstly introduced by Holland [20], is considered as a heuristic optimization technique (among the evolutionary algorithms) that follows the process of natural evolution. It generally generates solutions (chromosomes) to optimization problems through specific operators like selection, mutation, and crossover [21]. The final solutions are encoded in fixed length binary (0 and 1) strings. The modifications of this algorithm mainly focus on manipulation of the mentioned operators. The genetic programming (GP) [22] is an effective improvement of the GA, in which the solutions are presented as nonlinear structures of parse trees (treated as functions) instead of fixed length binary solutions. This modification results in searching among variety of possible functions for finding the final solution [22]. Considering the drawbacks of the GP (which will be discussed later), Ferreira [21] introduced a very fruitful modification to the original GP algorithm [22]. In the new strategy, called “gene expression programming (GEP)” [23], ramified structures of different sizes and shapes (parse trees) are completely encoded in the linear solutions of fixed length that finally lead to more probability of obtaining the global optimum of the model parameters [21], [23]. The description of the GEP [21] algorithm is given in the next section.

The GEP [21] strategy has been, up to now, implemented for several electrical, mechanical purposes such as development of stage-discharge curves of rivers [24] and splitting tensile strength of concrete [25]; it is of great interest to employ the same algorithm for determination of the flammability characteristics of chemical compounds such as FP. It should be noted that our group has very recently applied the method for development of some corresponding states models for thermal conductivity of gases [26], viscosity of gases [27], and solubility parameters [28].

Section snippets

Genetic programming

As mentioned earlier, the GP [22] is an extension of the genetic algorithms. The defined problem (the forms of the functions, number of parameters etc.) does not affect the main organization of the GP searches manner [22], [23]. The main distinction between the GP [22] and the GA [20] is that in the former, the chromosomes consist of nonlinear structures similar to parse trees though they are similar to the GA [22] linear structures, which are naked replicators working as genotype and phenotype

Data for flash point temperature

As the quality and generality of the applied database have a direct influence on the developed models/correlations for determination of FP, the DIPPR 801 database, has been used in this work [1]. The DIPPR 801 flash point temperatures of 1471 pure compounds have been employed for developing and validating an accurate correlation. The main data set was divided into three sub data sets; “training set”, “optimization set”, and “prediction set”. The first one is used to develop the model. The

Developing the correlation

As previously explained, one of the significant features of the GEP [21] mathematical strategy is that there is no need to assume specific functional forms to find the best prediction of the experimental data. Thus, the most accurate functional (correlation) form involving the most efficient independent parameters are obtained through the evolutionary algorithm itself. As mentioned earlier, empirical models for estimation of FP employed some other properties such as normal boiling point,

Results and discussion

The aforementioned calculation procedure has been pursued to obtain an accurate and simple correlation. As a matter of fact the GEP [21] algorithm computational steps define the required parameters, which yield the most accurate correlation from the introduced parameters (Tc, Pc, ω, Tb, Mw). Therefore, one can consider several independent parameters for a particular problem and obtain the ones, which have the most effects on the desired output results. The final correlation can be represented

Conclusion

In this work, the genetic expression programming [21] mathematical algorithm was applied for determination of the flash point temperatures of 1471 chemical compounds. The major focus of this is on improvement of the accuracy and generality of the previously presented correlations for evaluation of one of the most important flammability characteristics of chemical compounds. 883 DIPPR 801 FP data [1] were applied for developing (about 60% of the whole dataset) and about 294 ones (around 20% of

References (29)

  • A. Fujii et al.

    J. Saf. Res.

    (1982)
  • K. Satyanarayana et al.

    J. Hazard. Mater.

    (1992)
  • A.R. Katritzky et al.

    J. Mol. Graphics Modell.

    (2007)
  • F. Özcan

    Constr. Build. Mater.

    (2012)
  • Project 801, Evaluated Process Design Data, Public Release Documentation, Design Institute for Physical Properties (DIPPR)

    (2006)
  • F. Gharagheizi et al.

    QSAR Comb. Sci.

    (2008)
  • F. Gharagheizi et al.

    Energy Fuels

    (2008)
  • F. Gharagheizi et al.

    Ind. Eng. Chem. Res.

    (2011)
  • F. Gharagheizi, M.H. Keshavarz, M. Sattari, J. Therm. Anal. Calorim., http://dx.doi.org/10.1007/s10973-011-1951-5, in...
  • M. Vidal et al.

    Process Saf. Prog.

    (2004)
  • L. Catoire et al.

    J. Phys. Chem. Ref. Data

    (2004)
  • X. Liu et al.

    J. Chem. Eng. Data

    (2010)
  • R.W. Prugh

    J. Chem. Educ.

    (1973)
  • G.S. Patil

    Fire Mater.

    (1988)
  • Cited by (43)

    • Nonlinear group contribution model for the prediction of flash points using normal boiling points

      2017, Fluid Phase Equilibria
      Citation Excerpt :

      A good review on FP prediction has been already published [2–4]. These models can be classified into three categories: the group contribution (GC) models [5–10], the quantitative structure−property relationship (QSPR) models [11–14] and empirical correlations [15–17]. However a direct comparison between these methods seems to be difficult.

    • A GEP based model for prediction of densities of ionic liquids

      2017, Journal of Molecular Liquids
      Citation Excerpt :

      While the GA utilizes encoded numbers as solutions for a problem, GP benefits from parse trees to achieve the solution of the problem under consideration [73]. Hence, in the GP approach the individuals of populations, which are random solutions for the problem are symbolic expression trees (ETs) [74]. In GEP technique, these candidate solutions are linear chromosomes, which will be translated into appropriate ET forms during the progress of modeling process.

    View all citing articles on Scopus
    View full text