A genetic programming-based QSPR model for predicting solubility parameters of polymers

https://doi.org/10.1016/j.chemolab.2015.04.005Get rights and content

Highlights

  • The genetic programming (GP) model accurately predicts the solubility parameters.

  • The GP reconstruct transparent relationship to predict the solubility parameters.

  • The GP captures nonlinear relationships among the molecular descriptors.

Abstract

In this study, linear and nonlinear quantitative structure-property relationship (QSPR) models, respectively called the multiple linear regression based QSPR (MLR-QSPR) model and the genetic programming based QSPR (GP-QSPR) model, were built to predict the solubility parameters of polymers with structure –(C1H2–C2R3R4)–, as function of some constitutional, topological and quantum chemical descriptors. The results from the internal validation analysis indicated that the GP-QSPR model has better goodness of fit statistics. The external and overall validation measures also confirmed that the GP-QSPR model significantly outperforms the MLR-QSPR model in terms of some performance metrics over the same testing data set, and that genetic programming has good potential to obtain more accurate models in QSPR studies.

Introduction

Prediction of polymers' solubility parameters is of great importance in many technological or industrial applications of polymers [1], [2], [3], [4], [5]. The solubility parameter is an intrinsic physicochemical parameter which is defined simply from Hildebrand–Scatchard solution theory [6], [7]. On the other hand, its experimental determination is not easy and traditional methodologies (e.g. group contribution methods) of predicting polymeric solubility are insufficient to meet accuracy requirements [8], [9], [10], [11]. A few studies in the literature show that, in such cases, the development of predictive quantitative structure–property relationships (QSPR) models using linear or nonlinear data-driven methods (e.g., multiple linear regression, artificial neural networks and fuzzy set theory) seems to be a good alternative to overcome the shortcomings or the limitations of the conventional approaches such as the group contribution methods [12], [13], [14]. As an example, Yu et al. [12] introduced a multiple linear regression based QSPR model for the prediction of solubility parameters of amorphous polymers. They concluded that the presented QSPR model has a valuable ability to correlate the solubility parameters with the six molecular descriptors and its predictions were better than the previous models [13], [14]. However, the value of the correlation coefficient between experimental and predicted solubility parameters was limited to 0.840 during the testing stage of the proposed model.

This study examines the applicability of another data-driven method, which is genetic programming (GP), to predict the solubility of polymers. GP [15] is a purely nonlinear modeling approach that can be described as an extension of well-known genetic algorithm (GA). The main difference between them is the representation of the solution. While GA uses a string of numbers that represent the solution, GP solutions are computer programs. GP creates computer programs to solve a problem using the principle of Darwinian natural selection. It mainly differs from other data driven models (e.g., artificial neural networks) in that it defines an explicit functional relationship between input and output variables by optimizing the model structure and its coefficients simultaneously. To the authors' knowledge, applications of GP in QSPR studies are very few and include prediction of the wavelength of the lowest UV transition for a system of 18 anthocyanidins [16] and sublimation enthalpy of wide range organic contaminants only from their 3D molecular structures [17].

The present work focuses on further development of QSPR models for accurately predicting the solubility parameters of polymers. The GP based QSPR model was developed by using the experimental solubility parameters and molecular descriptors of 97 polymers with structure –(C1H2–C2R3R4)–, which was previously given in Yu et al. [12]. Its predictive performance was compared with that of multiple linear regression based QSPR model. This study is the first to investigate the implementation of GP in this field.

Section snippets

Theory and calculation

The process of GP starts with a random initial population of computer programs. An individual program present in the population refers to a parse tree, which is generated by the combination of its functions (nodes) and terminals (leaves) that are defined in a function set and terminal set, appropriate to the problem, respectively [15]. A function set may consist of basic arithmetic operators, mathematical functions, conditional operators, Boolean operators, iterative functions and any

Results and discussion

The GP based QSPR model was developed for exploring explicit relationships between the solubility parameter and influencing variables (i.e., molecular descriptors calculated directly from repeating unit structures of polymers). Yu et al. [12] showed that the formulation of the solubility parameter (δ) can be considered to be as follows:δ=fhb,alk,nN,Qii,Eint,QHwhere hb=mQ±n2, m is the number of –OH, –NH or –CN group in the side groups, Q± is hydrogen bond descriptor, n is the number of atoms

Conclusions

This study investigates the development of GP based QSPR model to predict the solubility parameters of polymers. The results show that the predictive performance of the GP based model is better than that of the traditional regression based model, since GP had the ability of effectively capturing complex real-world relationships compared to the conventional regression methods. Apart from the improvements in the prediction performance gained by using GP, this study demonstrates that GP can

Conflict of interest

The authors declare that there is no conflict of interest.

Acknowledgments

The authors thank Xinliang Yu, Xueye Wang, Hanlu Wang, Xiaobing Li and Jinwei Gao for all data used in this work.

References (32)

Cited by (17)

  • A simple correlation for reliable prediction of intrinsic viscosity (limiting viscosity number) of different polymer-solvent combinations

    2022, Fluid Phase Equilibria
    Citation Excerpt :

    Thus, δD, δP, and δH in Eq. (2) can forecast the behavior of such systems more accurately rather than using a single-valued solubility parameter [12]. Group contributions theory and quantitative structure-property relationships (QSPR) methodology are two different approaches, which have been recently developed for the prediction of δ, δD, δP, and δH of polymers and solvents [1,13-16]. Some QSPR methods based on complex descriptors have been developed to predict intrinsic viscosity.

  • Chemometrics approach for the prediction of chemical compounds’ toxicity degree based on quantum inspired optimization with applications in drug discovery

    2019, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    This means that each fold contains roughly the same proportions of the phenols types. The last set of experiments was conducted to test the validity and reliability of the suggested prediction model in drug discovery applications using benchmark logD7:4 dataset that was collected from [32,33]. Some comparisons were made between the suggested model and other regression models such as multiple linear regression (MLR), partial least squares (PLS) to predicate lipophilicity, the key physical property for small molecule oral drugs, because it is a key determinant of a range of ADME properties.

  • New prediction methods for solubility parameters based on molecular sigma profiles using pharmaceutical materials

    2018, International Journal of Pharmaceutics
    Citation Excerpt :

    However, this is rather an initial proposal for future updates to assign the molecular group contribution based on more data because there are currently only limited experimental values available. Other recent approaches are a determination of solubility parameters from molecular dynamics simulations (Gupta et al., 2011) or from quantitative structure property relationships (QSPR) (Gharagheizi, 2008; Goodarzi et al., 2010; Járvás et al., 2011; Koç and Koç, 2015). The latter QSPR relationships are based on selecting suitable molecular predictors regarding solubility parameter but this section is often rather arbitrary.

  • Multivariate optimization of Pb(II) removal for clinoptilolite-rich tuffs using genetic programming: A computational approach

    2018, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    GP is an AI technique based on Darwin's selection principles and biological operations (reproduction, crossover, and mutation). It presents advantages over others AI modeling techniques because its scheme prepares tangible and white-box models with are easily interpretable by engineers and scientists [34,35]. In GP mathematical formulas of the population (individuals) is reproduced through generations, preserving the best individuals that eventually evolve [36].

  • The Removal of arsenite [As(III)] and arsenate [As(V)] ions from wastewater using TFA and TAFA resins: Computational intelligence based reaction modeling and optimization

    2016, Journal of Environmental Chemical Engineering
    Citation Excerpt :

    An in-depth treatment of the GP-based symbolic regression can be found, for example, in Vyas et al. [18], Poli et al. [28], and Patil-Shinde et al. [29]. There exists a number of studies in chemistry and chemical engineering wherein the GP-based symbolic regression has been employed for developing data-driven predictive models (see, for example, Patil-Shinde et al. [30], Goel et al. [31], Pandey et al. [32], Koç and Koç [33], and Bahrami et al. [34]). It may however be noted that despite its several attractive properties, GP compared to ANNs and SVR formalisms has been utilized infrequently in chemistry and chemical engineering/technology.

  • Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids

    2023, Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids
View all citing articles on Scopus
1

Tel.: + 90 346 2191010/1318; fax: + 90 346 2191170.

View full text