A genetic programming-based QSPR model for predicting solubility parameters of polymers
Introduction
Prediction of polymers' solubility parameters is of great importance in many technological or industrial applications of polymers [1], [2], [3], [4], [5]. The solubility parameter is an intrinsic physicochemical parameter which is defined simply from Hildebrand–Scatchard solution theory [6], [7]. On the other hand, its experimental determination is not easy and traditional methodologies (e.g. group contribution methods) of predicting polymeric solubility are insufficient to meet accuracy requirements [8], [9], [10], [11]. A few studies in the literature show that, in such cases, the development of predictive quantitative structure–property relationships (QSPR) models using linear or nonlinear data-driven methods (e.g., multiple linear regression, artificial neural networks and fuzzy set theory) seems to be a good alternative to overcome the shortcomings or the limitations of the conventional approaches such as the group contribution methods [12], [13], [14]. As an example, Yu et al. [12] introduced a multiple linear regression based QSPR model for the prediction of solubility parameters of amorphous polymers. They concluded that the presented QSPR model has a valuable ability to correlate the solubility parameters with the six molecular descriptors and its predictions were better than the previous models [13], [14]. However, the value of the correlation coefficient between experimental and predicted solubility parameters was limited to 0.840 during the testing stage of the proposed model.
This study examines the applicability of another data-driven method, which is genetic programming (GP), to predict the solubility of polymers. GP [15] is a purely nonlinear modeling approach that can be described as an extension of well-known genetic algorithm (GA). The main difference between them is the representation of the solution. While GA uses a string of numbers that represent the solution, GP solutions are computer programs. GP creates computer programs to solve a problem using the principle of Darwinian natural selection. It mainly differs from other data driven models (e.g., artificial neural networks) in that it defines an explicit functional relationship between input and output variables by optimizing the model structure and its coefficients simultaneously. To the authors' knowledge, applications of GP in QSPR studies are very few and include prediction of the wavelength of the lowest UV transition for a system of 18 anthocyanidins [16] and sublimation enthalpy of wide range organic contaminants only from their 3D molecular structures [17].
The present work focuses on further development of QSPR models for accurately predicting the solubility parameters of polymers. The GP based QSPR model was developed by using the experimental solubility parameters and molecular descriptors of 97 polymers with structure –(C1H2–C2R3R4)–, which was previously given in Yu et al. [12]. Its predictive performance was compared with that of multiple linear regression based QSPR model. This study is the first to investigate the implementation of GP in this field.
Section snippets
Theory and calculation
The process of GP starts with a random initial population of computer programs. An individual program present in the population refers to a parse tree, which is generated by the combination of its functions (nodes) and terminals (leaves) that are defined in a function set and terminal set, appropriate to the problem, respectively [15]. A function set may consist of basic arithmetic operators, mathematical functions, conditional operators, Boolean operators, iterative functions and any
Results and discussion
The GP based QSPR model was developed for exploring explicit relationships between the solubility parameter and influencing variables (i.e., molecular descriptors calculated directly from repeating unit structures of polymers). Yu et al. [12] showed that the formulation of the solubility parameter (δ) can be considered to be as follows:where , m is the number of –OH, –NH or –CN group in the side groups, Q± is hydrogen bond descriptor, n is the number of atoms
Conclusions
This study investigates the development of GP based QSPR model to predict the solubility parameters of polymers. The results show that the predictive performance of the GP based model is better than that of the traditional regression based model, since GP had the ability of effectively capturing complex real-world relationships compared to the conventional regression methods. Apart from the improvements in the prediction performance gained by using GP, this study demonstrates that GP can
Conflict of interest
The authors declare that there is no conflict of interest.
Acknowledgments
The authors thank Xinliang Yu, Xueye Wang, Hanlu Wang, Xiaobing Li and Jinwei Gao for all data used in this work.
References (32)
- et al.
Molecular modeling of polymer composite-analyte interactions in electronic nose sensors
Sens. Actuators B
(2003) - et al.
Extended Hildebrand approach: solubility of caffeine in dioxane water mixtures
J. Pharm. Sci.
(1980) - et al.
Polymer property modeling using grid technology for design of structured products
Fluid Phase Equilib.
(2007) Predicting aqueous solubility of chlorinated hydrocarbons from molecular structure
Fluid Phase Equilib.
(2002)- et al.
A review of polymer dissolution
Prog. Polym. Sci.
(2003) - et al.
A new 3D molecular structure representation using quantum topology with application to structure–property relationships
Chemom. Intell. Lab. Syst.
(2000) - et al.
Simple yet accurate prediction method for sublimation enthalpies of organic contaminants using their molecular structure
Thermochim. Acta
(2012) - et al.
Routine high-return human-competitive automated problem-solving by means of genetic programming
Inf. Sci.
(2008) - et al.
Singular value decomposition in AHP
Eur. J. Oper. Res.
(2004) - et al.
QSPR with extended topochemical atom (ETA) indices: modeling of critical micelle concentration of non-ionic surfactants
Chem. Eng. Sci.
(2012)
The rm2 metrics and regression through origin approach: reliable and useful validation tools for predictive QSAR models
Eur. J. Pharm. Sci.
Further exploring rm2 metrics for validation of QSPR models
Chemom. Intell. Lab. Syst.
Prediction of the pH and the temperature-dependent swelling behavior of Ca2 +-alginate hydrogels by artificial neural networks
Chem. Eng. Sci.
Genetic algorithms based logic-driven fuzzy neural networks for stability assessment of rubble-mound breakwaters
Appl. Ocean Res.
Prediction of Polymer Properties
Conducting Polymers a New Era in Electrochemistry
Cited by (17)
A simple correlation for reliable prediction of intrinsic viscosity (limiting viscosity number) of different polymer-solvent combinations
2022, Fluid Phase EquilibriaCitation Excerpt :Thus, δD, δP, and δH in Eq. (2) can forecast the behavior of such systems more accurately rather than using a single-valued solubility parameter [12]. Group contributions theory and quantitative structure-property relationships (QSPR) methodology are two different approaches, which have been recently developed for the prediction of δ, δD, δP, and δH of polymers and solvents [1,13-16]. Some QSPR methods based on complex descriptors have been developed to predict intrinsic viscosity.
Chemometrics approach for the prediction of chemical compounds’ toxicity degree based on quantum inspired optimization with applications in drug discovery
2019, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :This means that each fold contains roughly the same proportions of the phenols types. The last set of experiments was conducted to test the validity and reliability of the suggested prediction model in drug discovery applications using benchmark logD7:4 dataset that was collected from [32,33]. Some comparisons were made between the suggested model and other regression models such as multiple linear regression (MLR), partial least squares (PLS) to predicate lipophilicity, the key physical property for small molecule oral drugs, because it is a key determinant of a range of ADME properties.
New prediction methods for solubility parameters based on molecular sigma profiles using pharmaceutical materials
2018, International Journal of PharmaceuticsCitation Excerpt :However, this is rather an initial proposal for future updates to assign the molecular group contribution based on more data because there are currently only limited experimental values available. Other recent approaches are a determination of solubility parameters from molecular dynamics simulations (Gupta et al., 2011) or from quantitative structure property relationships (QSPR) (Gharagheizi, 2008; Goodarzi et al., 2010; Járvás et al., 2011; Koç and Koç, 2015). The latter QSPR relationships are based on selecting suitable molecular predictors regarding solubility parameter but this section is often rather arbitrary.
Multivariate optimization of Pb(II) removal for clinoptilolite-rich tuffs using genetic programming: A computational approach
2018, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :GP is an AI technique based on Darwin's selection principles and biological operations (reproduction, crossover, and mutation). It presents advantages over others AI modeling techniques because its scheme prepares tangible and white-box models with are easily interpretable by engineers and scientists [34,35]. In GP mathematical formulas of the population (individuals) is reproduced through generations, preserving the best individuals that eventually evolve [36].
The Removal of arsenite [As(III)] and arsenate [As(V)] ions from wastewater using TFA and TAFA resins: Computational intelligence based reaction modeling and optimization
2016, Journal of Environmental Chemical EngineeringCitation Excerpt :An in-depth treatment of the GP-based symbolic regression can be found, for example, in Vyas et al. [18], Poli et al. [28], and Patil-Shinde et al. [29]. There exists a number of studies in chemistry and chemical engineering wherein the GP-based symbolic regression has been employed for developing data-driven predictive models (see, for example, Patil-Shinde et al. [30], Goel et al. [31], Pandey et al. [32], Koç and Koç [33], and Bahrami et al. [34]). It may however be noted that despite its several attractive properties, GP compared to ANNs and SVR formalisms has been utilized infrequently in chemistry and chemical engineering/technology.
Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids
2023, Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids
- 1
Tel.: + 90 346 2191010/1318; fax: + 90 346 2191170.