Next Article in Journal
Image Analysis Applications for Building Inter-Story Drift Monitoring
Next Article in Special Issue
Environmental Evaluation of Concrete Containing Recycled and By-Product Aggregates Based on Life Cycle Assessment
Previous Article in Journal
Nanocomposites Photocatalysis Application for the Purification of Phenols and Real Olive Mill Wastewater through a Sequential Process
Previous Article in Special Issue
The Application of Seabed Silt in the Preparation of Artificial Algal Reefs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC)

1
Department of Civil Engineering, COMSATS University Islamabad, Abbottabad Campus 22060, Pakistan
2
Department of Civil and Environmental Engineering, College of Engineering, King Faisal University (KFU), P.O. Box 380, Al-Hofuf, Al Ahsa 31982, Saudi Arabia
3
Department of Transportation Engineering, Military College of Engineering (MCE), National University of Science and Technology (NUST), Risalpur 23200, Pakistan
4
Department of Civil Engineering, College of Engineering in Al-Kharj, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2020, 10(20), 7330; https://doi.org/10.3390/app10207330
Submission received: 23 August 2020 / Revised: 29 September 2020 / Accepted: 6 October 2020 / Published: 20 October 2020
(This article belongs to the Special Issue Green Concrete for a Better Sustainable Environment II)

Abstract

:
Supervised machine learning and its algorithm is an emerging trend for the prediction of mechanical properties of concrete. This study uses an ensemble random forest (RF) and gene expression programming (GEP) algorithm for the compressive strength prediction of high strength concrete. The parameters include cement content, coarse aggregate to fine aggregate ratio, water, and superplasticizer. Moreover, statistical analyses like MAE, RSE, and RRMSE are used to evaluate the performance of models. The RF ensemble model outbursts in performance as it uses a weak base learner decision tree and gives an adamant determination of coefficient R2 = 0.96 with fewer errors. The GEP algorithm depicts a good response in between actual values and prediction values with an empirical relation. An external statistical check is also applied on RF and GEP models to validate the variables with data points. Artificial neural networks (ANNs) and decision tree (DT) are also used on a given data sample and comparison is made with the aforementioned models. Permutation features using python are done on the variables to give an influential parameter. The machine learning algorithm reveals a strong correlation between targets and predicts with less statistical measures showing the accuracy of the entire model.

Graphical Abstract

1. Introduction

High strength concrete (HSC) has its popularity spread wide and far for its superior performance. HSC has been deemed superior for its substantial high strength and durability [1,2,3,4]. Its strength has been witnessed to be higher than that of conventional concrete, a quality that has drastically increased its use in the modern-day construction industry [5]. A new technology that results in homogenous and dense concrete, and also bolsters the strength parameters, is the reason for the permeation in its use within the construction industry [5,6]. It has been commonly used in concrete-filled steel tubes, bridges, and columns. As per the American Concrete Institute (ACI), “HSC is the one that possesses a specific requirement for its working which cannot be achieved by conventional concrete” [7]. Numerous researchers suggested different methods for the mix design of HSC. All the methods of mix design require a specific set of experimental trials to achieve the target strength. It is an ineluctable truth that the experimental work is time consuming and requires a substantial amount of money. In addition, amateur technicians and error in testing machines raise questions on the veracity of the experimental work conducted across the globe. Various researchers used different statistical methods to predict different properties of HSC. Some of the studies are summarized in Table 1. However, this field still requires further exploration.
In recent years, concepts of machine learning are used successfully in various fields for the predictions of different properties. Likewise, the civil engineering construction industry has also adopted such techniques to overcome cumbersome experimental procedures. For instance, some of these approaches include multivariate adaptive regression spline (MARS) [15,16], genetic engineering programming (GEP) [17,18,19,20], support vector machine (SVM) [21,22], artificial neural networks (ANN) [23,24,25], decision tree (DT) [26,27,28], adaptive boost algorithm (ABA), and adaptive neuro-fuzzy interference (ANFIS) [29,30,31,32]. Javed et al. [18] predict the axial behavior of a concrete-filled steel tube (CFST) with 227 data points by using gene expression programming. The author achieves adamant strong correlation between prediction and experimental axial capacity [18]. Farjad el al. [33] used gene expression programming in the prediction of mechanical properties of waste foundry sand in concrete. Gregor et al. [34] adopted the ANN approach to evaluate the compressive strength of concrete. It was witnessed that ANN depicts the experimental values accurately; thus, it proves to be an exceptional prediction tool. Amir et al. [35] predict the compressive strength of geopolymer concrete incorporating natural zeolite and silica fume by using ANN. ANN thus established a good relationship and gave obstinate accuracy in prediction of geopolymer concrete. Zahra et al. [32] predict the compressive strength of concrete with ANN and ANFIS models. The authors reveal that ANFIS gives a more adamant and stronger correlation than the ANN model. Javed et al. [36] predict the compressive strength of sugar cane bagasse ash concrete by conducting the experimental and literature-based study. Experimental work is used to validate the model and remaining data were gathered from published literature. The author used the GEP algorithm and obtained a good model between target values. Nour et al. [37] used the GEP algorithm to predict the compressive strength of concrete filled steel columns incorporating recycled aggregate (RACFSTC). The author used 97 data points in the modeling aspect of the RACFSTC column and observed adamant correlation. Junfei et al. [38] modeled the compressive strength self-compacting concrete by using beetle antennae search-based random forest algorithm. The author obtained an obstinate strong correlation of R2 = 0.97 with experimental results. Qinghua et al. [26] employed random forest approach to predict the compressive strength of high-performance concrete. Similarly, Sun et al. [39] used evolved random forest algorithm on 138 data samples to predict the compressive strength of rubberized concrete which was collected from published literature. This advanced-based approach gave better performance with a strong coefficient correlation of R2 = 0.96. ANN and other models have been adopted for predicting the mechanical strength parameters of high-performance concrete and recycled aggregate concrete [40,41,42,43,44]. Pala et al. [45] studied the influence of silica and fly ash on the compressive strength of concrete. A comprehensive experimental was carried out to analyze the impact of varying w/c ratios and varying percentages of silica and fly ash on the performance of concrete. In addition, ANN was adopted to depict the effect on the strength parameters of concrete [45]. Azim et al. [44] used a GEP-based machine learning algorithm to predict the compressive arch action of a reinforced concrete structure. The author found that GEP is an effective tool for prediction performance.
This paper aimed at evaluating the performance of compressive strength of a high strength concrete (HSC) using ensemble random forest (RF) and gene expression programming (GEP). The data points used to model were attained from published articles and are listed in Table S1. Anaconda spyder python-based programming [46] and GENEXprotool software [47] are used for prediction of the compressive strength of HSC. The parameters used in model contain cement, water, coarse aggregate to fine aggregate ratio, superplasticizer as input, and compressive strength as output for model development. Hex contour graphs are made to show the relationship of the input and output parameters. Sensitivity analysis (SA) and permutation feature importance (PFI) that address the relative importance of each variable on the desired output parameters are conducted. Moreover, the model evaluation is also carried out by using statistical measures.

2. Research Methodology

2.1. Random Forest Regression

Random forest regression is proposed by Breiman in 2001 [48] and is considered an improved classification regression method. The main features of RF include the speed and flexibility in creating the relationship between input and output functions. In addition, RF handles the large datasets more efficiently as compared to other machine learning techniques. RF has been used in various fields, for instance, it had been used in banking for predicting customer response [49], for predicting the direction of stock market prices [50], in the medicine/pharmaceutical industry [51], e-commerce [52], etc.
The RF method consists of the following main steps:
  • Collection of trained regression trees using training set.
  • Calculating average of the individual regression tree output.
  • Cross-validation of the predicted data using validation set.
A new training set consisting of bootstrap samples is calculated by replacing the original training set. During implementation of this step, some of the sample points are deleted and replaced with existing sample points. The deleted sample points are collected in separate set, known as out-of-bag samples. Afterwards, 2/3rd of the sample points is utilized for estimating regression function. In this case, the out-of-bag samples are used for the validation of the model. The process is repeated several times till the required accuracy is achieved. This in-built process of deleting the points for out-of-bag samples and utilizing them for validation purpose is the unique capability of RFR. The total error is calculated for each expression tree at the end and shows the efficiency of each expression tree.

2.2. Gene Expression Programming

GEP is proposed by Ferreira [53] as an improved form of genetic programming (GP). It uses a linear string and parse tree of varying lengths. The GEP model includes function set, terminal set, terminal conditions, control parameters, and objective function. GEP creates an initial set of selected individuals and converts them to expression trees of different sizes and shapes. This step is necessary to represent the solutions of GEP in mathematical form. Finally, the predicted value is compared with the experimental one to calculate the fitness of each data point. The model stops working when the overall fitness of the complete dataset stops improving. The best result giving chromosome is selected and passed to next generation. The process repeats itself until satisfactory fitness is obtained.
Chromosomes in GEP consist of different arithmetic operations and a constant length variable. An example of a GEP gene is shown in Equation (1):
+ . y . B . B . . + . A . D . C . 2 . B . C . 3
where A, B, C, D are variables (terminal set) and 2, 3 are constants.

3. Experimental Database Representation

3.1. Dataset Used in Modeling Aspect

Model evaluation is based on data sample and the number of parameters used. A total of 357 datasets were obtained from published literature (See Table S1). These points were trained, validated, and tested during modeling to build a numerical-based empirical relation for HSC. This is done to minimize the over fitting of data in machine learning approaches. The samples were divided into 70/15/15 sets to give adamant correlation coefficient. Behnood et al. [54] predict the mechanical properties of concrete with data taken from published literature. The samples were randomly distributed for training (70%), validation (15%), and testing (15%) sets. Similarly, Getahun et al. [55] forecasted the mechanical properties of concrete by distributing the data in the same way as discussed. Training is usually done to train the model with given values which then predict the values of strength of unknown values, namely the test set.

3.2. Programming-Based Presentation of Datasets

Anaconda-based python version (version 3.7) programming [46] has been adopted to depict the influence of various input parameters upon the mechanical strength of HSC. Compressive strength of concrete is influenced by the number of parameters used in experimental work. Thus, cement content (Type 1), water, superplasticizer (polycarboxylate), and fine and coarse aggregate (20 mm) were used in modeling of the compressive strength of HSC. The impact of these input parameters was visualized with the use of python which is done in Jupitar notebook [56] as shown in Figure 1.
Figure 1 represents the quantities which have adamant influence on the mechanical properties of HSC. The darkish region shows the optimal/maximum concentration of variables as depicted in Figure 1. Python is an effective machine learning approach that enables users to have a deep understanding of the parameters that alter the functioning of the model. Python uses the seaborn command to plot the correlation among the desired parameters. The description of the data variables (see Table 2) used in the model consist of training set, validation set, and testing set as represented in Table 3, Table 4 and Table 5. The parameters that define and ensure that optimum results are achieved for all techniques. Identifying these parameters is of core importance.

4. GEP Model Development

The secondary objective during this research work was to derive a generalized equation for the compressive strength of HSC. For this purpose, a terminal set, a function set, and four parameters (d0: cement content, d1: fine to coarse aggregate, d2: water, d3: superplasticizer) were used in modeling. These input parameters were utilized for the development of the model based on gene expression programming. In addition, simple mathematical operations (+, −, /, ×) were used which were part of the function set. A simple arithmetic operation was used to build an empirical-based relation which is the function of the following parameters
f c = f ( c e m e n t   c o n t e n t , f i n e c o a r s e a g g r e g a t e ,   w a t e r ,   s u p e r p l a s t i c i z e r )
The GEP-based model, like all genetic algorithm models, is significantly influenced by the input parameters (variables) upon which they are modeled. These variables had a substantial impact on the generalizing fitness of these models. The variables used during this study are tabulated in Table 6. The model time is an important parameter to analyze the effectiveness of the model. Thus, efforts shall be made while selecting the sets which control the model time to ensure that the generalized model always developed within due time. The selecting of these parameters is based on hit and trial method to get maximum correlation. Root mean squared error (RMSE) was adopted in modeling. Moreover, the performance of the model based on GEP is expressed by tree like architecture structures. This structure consists of head size and number of genes [57].

5. Model Performance Analysis

To assess the viability of any model and to evaluate its performance, various indicators have been used. Each indicator has its method of inferring the performance of these models. The indicators commonly used include root mean squared error (RMSE), mean absolute error (MAE), relative mean square error (RSE), relative root mean squared error (RRMSE), and coefficient of determination (R2). The mathematical expressions for these indicators are given below.
R M S E = i = 1 n   ( e x i m o i ) 2 n
M A E = i = 1 n | e x i m o i | n
R S E =   i = 1 n ( m o i e x i ) 2 i = 1 n ( e x ¯ e x i ) 2
R R M S E = 1 e i = 1 n ( e x i m o i ) 2 n
R = i = 1 n ( e x i e x ¯ i ) ( m o i m o ¯ i ) i = 1 n ( e x i e x ¯ i ) 2 i = 1 n ( m o i m o ¯ i ) 2
ρ = R R M S E 1 + R
where:
e x i = experimental actual strength.
m o i = model strength.
e x ¯   i = average value of the experimental outcome.
m o ¯ i = average value of the predicted outcome.
In this paper, the performance of the model is also evaluated by using the coefficient of determination (R2). The model is deemed effective when the value of R2 is greater than 0.8 and is close to 1 [58]. The value obtained through model is the reflection that shows the correlation between the experimental and predicted outcomes. Lower values of the indicator errors like MAE, RRMSE, RMSE, and RSE indicate higher performance. Machine learning is a good approach in the prediction of properties. However, overfitting issues in a dataset have a malignant effect in validation and fore casting of mechanical aspect of HSC. Thus, overcoming this problem of overfitting has become a dire need in supervised machine learning algorithms. Researchers used objective function (OBF) for the accuracy of models. OBF uses overall data samples along with the error and regression coefficient. This then provides a more accurate generalized model with adamant higher accuracy and is represented in Equation (8) [59].
O B F = n T r a i n n T e s t n ρ T r a i n + 2 ( n T e s t n ) ρ T e s t

6. Results and Discussion

6.1. Random Forest Model Analysis

Random forest is an ensemble modeling algorithm which uses a weak learner to give the best performance as depicted in Figure 2. These algorithms are supervised learners giving adamant accuracy in terms of correlation. The model is divided into twenty submodels to give maximum determination of coefficient as illustrated in Figure 2a. It can be seen that sub-model equal to 10 outbursts and gives a strong relationship. It is due to incorporation of a weak learner (decision tree), which then uses it in the ensembling algorithm. Moreover, the model gives an obstinate correlation of R2 = 0.96 between experimental and predicted values and gives good validation results as illustrated in Figure 2b,c. In addition, the model performance shows less error as illustrated in Figure 2d. All the predicted data points lie in the same range of experimental values with an error less than 10MPa. This shows that the random forest ensemble algorithm gives adamant good results.
Statistical analysis checks are applied to check the model performance using random forest. This is an indirect method which shows model performance. These statistical analyses check the errors in the model; thus, RMSE, MAE, RSE, and RRMSE are used as shown in Table 7. The RF model is ensemble one and thus shows lesser error in the prediction aspect.

6.2. Empirical Relation of HSC Using the GEP Model

Gene expression programming is an individual supervised machine learning approach which predicts the mechanical compressive strength using tree-based expression. Moreover, GEP gives an empirical relation with input parameters as shown in Equation (9). This simplified equation is then used to predict the compressive strength of HSC. This equation comes from the expression tree which used a function set and terminal set with the mathematics operator as shown in Figure 3. It shows the relationship between input parameters and output strength. GEP utilizes linear as well as non-linear algorithms in the forecasting of mechanical properties.
f c ( M P a ) = A + B + C
where
A = ( 19.97 c e m e n t ( w a t e r + s u p e r p l a s t i c i z e r ) + 15.31 )
+
B = ( ( 5.32 + ( 2.41 ) ) ( ( 0.58 F C   a g g ) + s u p e r p l a s t i c i z e r 0.50 ( F C a g g ) ) )
+
C = ( ( 0.77 4.77 c e m e n t + 32.4 ( ( w a t e r + s u p e r p l a s t i c i z e r ) 8.64 ) ) + s u p e r p l a s t i c i z e r )
Before running the GEP algorithm, the procedure starts with the selection of the number of chromosomes and basic operators that are provided by GEP software. The model uses hit and trial techniques where chromosomes of varying sizes and gene numbers are used with operational operators, thus ensuring the selection of the best model. The selected model has the best/fittest gene available within the population which gives adamant performance in making the model. The most feasible and desirable outcome used in the GEP model is f c , which is expressed in the form of an expression tree as shown in Figure 3. Expression tree uses a linkage function with a basic mathematical operator with some constants. It is worth mentioning here that the GEP algorithm uses the RMSE function for its prediction.

6.3. GEP Model Evaluation

Model evaluation and its representation between observed and predicted values is illustrated in Figure 4. GEP-based machine learning algorithm is an effective approach to assess the strength parameters of HSC. Model assessment in machine learning is usually done with regression analysis. Regression analysis shows the accuracy of any model with value close to one is an adamant accurate model as represented in Figure 4b. It shows that the regression line of the testing and validation sets is close to 1. Figure 4a,b represent the regression analysis of validation and testing sets with coefficient of determination R2. This value is greater than 0.8 which depicts the accuracy of the model as 0.91 and 0.90 for the testing (see Figure 4a) and validation (see Figure 4b) sets, respectively. Normalization of gathered data from published literature was also done within the range of zero and one to show the accurateness of data as illustrated in Figure 4c.
Statistical measures are used to evaluate the performance of the model by using MAE, RRMSE, RSE, and RMSE as done similarly in a random forest model as shown in Table 8. Low error and higher coefficient give better performance of the model. Most of the errors lies below 5 MPa with an R2 value greater than 0.8. Thus, it depicts the accuracy of the finalized model. Further analysis is also performed to evaluate the performance of the model by determining the standard deviation (SD) and covariance (COV). The values of SD and COV are determined to be 0.16 and 0.059, respectively.
The accuracy and performance of the machine learning-based model is evaluated by conducting error distribution between actual targets and predicted values of the testing set as shown in Figure 5. It can be seen that the model predicted the outcome nearly or equal to the experimental values. Moreover, the error distribution of the testing set shows that 86% of the data sample lies below 5 MPa and 13.88% of the data lies in the range of 5 MPa to 8 MPa with 7.47 MPa as maximum error. Thus, the GEP-based model not only gives obstinate accuracy in terms of correlation but also gives the empirical equation shown in Equation (9). This equation will help the users to predict the compressive strength of concrete by using hand calculations.

7. Statistical Analysis Checks on RF and GEP Model

The accuracy of any model is based on data points. The higher the points, the greater will be the accuracy of the entire model [60]. Frank et al. [60] present an ideal solution based on the ratio of input data samples to its parameters involved. This ratio should be equal to or greater than three for good performance of the model. This study uses 357 data samples with the 4 variables mentioned earlier with the ratio equal to 89.25. This ratio value is exceptionally higher, indicating the accuracy of the model. Farjad et al. [33] used a similar approach to validate the model and yield adamant results with a ratio greater than 3. Researchers suggest different approaches for the validation of a model using external statistical measures [61,62]. Golbraikh et al. [62] validate their model using the slope of the regression line (k’ or k) of the model. This line measures the accuracy of the model by using experimental and predicted values. Any value greater than 0.8 or close to 1 will yield obstinate performance of the model [61]. All these external checks have been presented in tabulated in Table 9.

8. Comparison of Models with ANN and Decision Tree

Ensemble RF and GEP approach are compared with other supervised machine learning algorithms, namely ANN and DT as depicted in Figure 6. These techniques, along with GEP, are individual algorithms. However, RF is an ensemble one which incorporates a base learner as an individual learner and model it with bagging technique to give an adamant strong correlation. It should be kept in mind that all models are based on python (anaconda). The comparison of models is presented in Figure 6. The RF outburst in performance of the model can be seen with R2 = 0.96 and its error distribution as shown in Figure 6a,b. Whereas individual models ANN, DT, and GEP show good response with R2 = 0.89, 0.90, and 0.90, respectively. Figure 6d represents the error distribution of decision tree with maximum error below 10 MPa. However, 18.19 MPa is reported as the maximum error. A similar trend has also been observed for ANN and GEP models with maximum error values of 11.80 MPa and 7.48 MPa, respectively as shown in Figure 6f,h. Moreover, researchers used different algorithm-based machine learning techniques for the prediction of mechanical properties of high strength concrete. Ahmed et al. [63] used an ANN algorithm and forecasted the mechanical properties (slump and compressive strength) of HSC. The author evaluated its model with ANN and revealed strong correlation for slump and compressive of about 0.99. Singh et al. [64] forecasted the mechanical properties of HSC by using RF and M5P algorithms and reported strong correlation for the testing set of 0.876 and 0.814, respectively.

9. Permutation Feature Analysis (PFA)

Permutation feature analysis (PFA) is performed to determine the most influential parameters affecting the compressive strength of HSC. PFA is performed by utilizing an extension of python programming. Figure 7 shows the results of PFA. The results show that all the variables considered in this study strongly affect the compressive strength property of HSC. However, the effect of super plastizer is more as compared to the other variables.

10. Conclusions

Supervised machine learning predicts the mechanical properties of concrete and gives outmost result. This will help the user to forecast the desire properties rather than conducting the experimental setup. The following properties are deduced from using the machine learning algorithm.
  • Random forest is an ensemble approach which gives adamant performance between observed and predicted value. It is due to incorporation of a weak learner as base learner (decision tree) and gives determination of coefficient R2 = 0.96.
  • GEP is an individual model rather than an ensemble algorithm. It gives a good relation with the empirical relation. This relation can be used to predict the mechanical aspect of high strength concrete via hand calculation.
  • Comparison of the RF and GEP models is made with ANN and DT. However, RF outbursts and gives an obstinate relation of R2 = 0.96. GEP model gives R2 = 0.90. ANN and DT models give 0.89 and 0.90, respectively. Moreover, RF gives less errors as compared to others individual algorithms. This is due to the bagging mechanism of RF.
  • Permutation features give an influential parameter in HSC. This help us to check and know the most dominant variables in using experimental work; thus, all the variables have an effect on compressive strength.

Supplementary Materials

The following are available online at https://www.mdpi.com/2076-3417/10/20/7330/s1, Table S1: Supplementary material.

Author Contributions

F.F., software and investigation; M.N.A., writing—review and editing; K.K., writing—review and editing; M.R.S., review and editing; M.F.J., graphs and review; F.A., editing and writing; R.A., funding and review. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This research was supported by the Deanship of Scientific Research (DSR) at King Faisal University (KFU) through “18th Annual Research Project No. 180062”. The authors wish to express their gratitude for the financial support that has made this study possible and also supported by the deanship of scientific research at Prince Sattam Bin Abdulaziz University under the research project number 2020/01/16810.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, X.; Han, J. The effect of ultra-fine admixture on the rheological property of cement paste. Cem. Concr. Res. 2000, 30, 827–830. [Google Scholar] [CrossRef]
  2. Khaloo, A.; Mobini, M.H.; Hosseini, P. Influence of different types of nano-SiO2 particles on properties of high-performance concrete. Constr. Build. Mater. 2016, 113, 188–201. [Google Scholar] [CrossRef]
  3. Hooton, R.D.; Bickley, J.A. Design for durability: The key to improving concrete sustainability. Constr. Build. Mater. 2014, 67, 422–430. [Google Scholar] [CrossRef]
  4. Farooq, F.; Akbar, A.; Khushnood, R.A.; Muhammad, W.L.B.; Rehman, S.K.U.; Javed, M.F. Experimental investigation of hybrid carbon nanotubes and graphite nanoplatelets on rheology, shrinkage, mechanical, and microstructure of SCCM. Materials 2020, 13, 230. [Google Scholar] [CrossRef] [Green Version]
  5. Carrasquillo, R.; Nilson, A.; Slate, F.S. Properties of High Strength Concrete Subjectto Short-Term Loads. 1981. Available online: https://www.concrete.org/publications/internationalconcreteabstractsportal.aspx?m=details&ID=6914 (accessed on 27 September 2020).
  6. Mbessa, M.; Péra, J. Durability of high-strength concrete in ammonium sulfate solution. Cem. Concr. Res. 2001, 31, 1227–1231. [Google Scholar] [CrossRef]
  7. Baykasoǧlu, A.; Öztaş, A.; Özbay, E. Prediction and multi-objective optimization of high-strength concrete parameters via soft computing approaches. Expert Syst. Appl. 2009, 36, 6145–6155. [Google Scholar] [CrossRef]
  8. Demir, F. Prediction of elastic modulus of normal and high strength concrete by artificial neural networks. Constr. Build. Mater. 2008, 22, 1428–1435. [Google Scholar] [CrossRef]
  9. Demir, F. A new way of prediction elastic modulus of normal and high strength concrete-fuzzy logic. Cem. Concr. Res. 2005, 35, 1531–1538. [Google Scholar] [CrossRef]
  10. Yan, K.; Shi, C. Prediction of elastic modulus of normal and high strength concrete by support vector machine. Constr. Build. Mater. 2010, 24, 1479–1485. [Google Scholar] [CrossRef]
  11. Ahmadi-Nedushan, B. Prediction of elastic modulus of normal and high strength concrete using ANFIS and optimal nonlinear regression models. Constr. Build. Mater. 2012, 36, 665–673. [Google Scholar] [CrossRef]
  12. Safiuddin, M.; Raman, S.N.; Salam, M.A.; Jumaat, M.Z. Modeling of compressive strength for self-consolidating high-strength concrete incorporating palm oil fuel ash. Materials 2016, 9, 396. [Google Scholar] [CrossRef] [PubMed]
  13. Al-Shamiri, A.K.; Kim, J.H.; Yuan, T.F.; Yoon, Y.S. Modeling the compressive strength of high-strength concrete: An extreme learning approach. Constr. Build. Mater. 2019, 208, 204–219. [Google Scholar] [CrossRef]
  14. Aslam, F.; Farooq, F.; Amin, M.N.; Khan, K.; Waheed, A.; Akbar, A.; Javed, M.F.; Alyousef, R.; Alabdulijabbar, H. Applications of Gene Expression Programming for Estimating Compressive Strength of High-Strength Concrete. Adv. Civ. Eng. 2020, 2020, 1–23. [Google Scholar] [CrossRef]
  15. Samui, P. Multivariate adaptive regression spline (MARS) for prediction of elastic modulus of jointed rock mass. Geotech. Geol. Eng. 2013, 31, 249–253. [Google Scholar] [CrossRef]
  16. Gholampour, A.; Mansouri, I.; Kisi, O.; Ozbakkaloglu, T. Evaluation of mechanical properties of concretes containing coarse recycled concrete aggregates using multivariate adaptive regression splines (MARS), M5 model tree (M5Tree), and least squares support vector regression (LSSVR) models. Neural Comput. Appl. 2020, 32, 295–308. [Google Scholar] [CrossRef]
  17. Shahmansouri, A.A.; Bengar, H.A.; Ghanbari, S. Compressive strength prediction of eco-efficient GGBS-based geopolymer concrete using GEP method. J. Build. Eng. 2020, 31, 101326. [Google Scholar] [CrossRef]
  18. Javed, M.F.; Farooq, F.; Memon, S.A.; Akbar, A.; Khan, M.A.; Aslam, F.; Alyousef, R.; Alabduljabbar, H.; Rehman, S.K.U. New prediction model for the ultimate axial capacity of concrete-filled steel tubes: An evolutionary approach. Crystals 2020, 10, 741. [Google Scholar] [CrossRef]
  19. Sonebi, M.; Abdulkadir, C. Genetic programming based formulation for fresh and hardened properties of self-compacting concrete containing pulverised fuel ash. Constr. Build. Mater. 2009, 23, 2614–2622. [Google Scholar] [CrossRef]
  20. Rinchon, J.P.M. Strength durability-based design mix of self-compacting concrete with cementitious blend using hybrid neural network-genetic algorithm. IPTEK J. Proc. Ser. 2017, 3. [Google Scholar] [CrossRef] [Green Version]
  21. Kang, F.; Li, J.; Dai, J. Prediction of long-term temperature effect in structural health monitoring of concrete dams using support vector machines with Jaya optimizer and salp swarm algorithms. Adv. Eng. Softw. 2019, 131, 60–76. [Google Scholar] [CrossRef]
  22. Ling, H.; Qian, C.; Kang, W.; Liang, C.; Chen, H. Combination of support vector machine and K-fold cross validation to predict compressive strength of concrete in marine environment. Constr. Build. Mater. 2019, 206, 355–363. [Google Scholar] [CrossRef]
  23. Ababneh, A.; Alhassan, M.; Abu-Haifa, M. Predicting the contribution of recycled aggregate concrete to the shear capacity of beams without transverse reinforcement using artificial neural networks. Case Stud. Constr. Mater. 2020, 13, e00414. [Google Scholar] [CrossRef]
  24. Xu, J.; Chen, Y.; Xie, T.; Zhao, X.; Xiong, B.; Chen, Z. Prediction of triaxial behavior of recycled aggregate concrete using multivariable regression and artificial neural network techniques. Constr. Build. Mater. 2019, 226, 534–554. [Google Scholar] [CrossRef]
  25. Van Dao, D.; Ly, H.B.; Vu, H.L.T.; Le, T.T.; Pham, B.T. Investigation and optimization of the C-ANN structure in predicting the compressive strength of foamed concrete. Materials 2020, 13, 1072. [Google Scholar] [CrossRef] [Green Version]
  26. Han, Q.; Gui, C.; Xu, J.; Lacidogna, G. A generalized method to predict the compressive strength of high-performance concrete by improved random forest algorithm. Constr. Build. Mater. 2019, 226, 734–742. [Google Scholar] [CrossRef]
  27. Zounemat-Kermani, M.; Stephan, D.; Barjenbruch, M.; Hinkelmann, R. Ensemble data mining modeling in corrosion of concrete sewer: A comparative study of network-based (MLPNN & RBFNN) and tree-based (RF, CHAID, & CART) models. Adv. Eng. Inform. 2020, 43, 101030. [Google Scholar] [CrossRef]
  28. Zhang, J.; Li, D.; Wang, Y. Toward intelligent construction: Prediction of mechanical properties of manufactured-sand concrete using tree-based models. J. Clean. Prod. 2020, 258, 120665. [Google Scholar] [CrossRef]
  29. Vakhshouri, B.; Nejadi, S. Predicition of compressive strength in light-weight self-compacting concrete by ANFIS analytical model. Arch. Civ. Eng. 2015, 61, 53–72. [Google Scholar] [CrossRef]
  30. Dutta, S.; Murthy, A.R.; Kim, D.; Samui, P. Prediction of Compressive Strength of Self-Compacting Concrete Using Intelligent Computational Modeling Call for Chapter: Risk, Reliability and Sustainable Remediation in the Field OF Civil AND Environmental Engineering (Elsevier) View project Ground Rub. 2017. Available online: https://www.researchgate.net/publication/321700276 (accessed on 27 September 2020).
  31. Vakhshouri, B.; Nejadi, S. Prediction of compressive strength of self-compacting concrete by ANFIS models. Neurocomputing 2018, 280, 13–22. [Google Scholar] [CrossRef]
  32. Info, A. Application of ANN and ANFIS Models Determining Compressive Strength of Concrete. Soft Comput. Civ. Eng. 2018, 2, 62–70. Available online: http://www.jsoftcivil.com/article_51114.html (accessed on 27 September 2020).
  33. Iqbal, M.F.; Liu, Q.f.; Azim, I.; Zhu, X.; Yang, J.; Javed, M.F.; Rauf, M. Prediction of mechanical properties of green concrete incorporating waste foundry sand based on gene expression programming. J. Hazard. Mater. 2020, 384, 121322. [Google Scholar] [CrossRef]
  34. Trtnik, G.; Kavčič, F.; Turk, G. Prediction of concrete strength using ultrasonic pulse velocity and artificial neural networks. Ultrasonics 2009, 49, 53–60. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Shahmansouri, A.A.; Yazdani, M.; Ghanbari, S.; Bengar, H.A.; Jafari, A.; Ghatte, H.F. Artificial neural network model to predict the compressive strength of eco-friendly geopolymer concrete incorporating silica fume and natural zeolite. J. Clean. Prod. 2020, 279, 123697. [Google Scholar] [CrossRef]
  36. Javed, M.F.; Amin, M.N.; Shah, M.I.; Khan, K.; Iftikhar, B.; Farooq, F.; Aslam, F.; Alyousef, R.; Alabduljabbar, H. Applications of gene expression programming and regression techniques for estimating compressive strength of bagasse Ash based concrete. Crystals 2020, 10, 737. [Google Scholar] [CrossRef]
  37. Nour, A.I.; Güneyisi, E.M. Prediction model on compressive strength of recycled aggregate concrete filled steel tube columns. Compos. Part B Eng. 2019, 173. [Google Scholar] [CrossRef]
  38. Zhang, J.; Ma, G.; Huang, Y.; Sun, J.; Aslani, F.; Nener, B. Modelling uniaxial compressive strength of lightweight self-compacting concrete using random forest regression. Constr. Build. Mater. 2019, 210, 713–719. [Google Scholar] [CrossRef]
  39. Sun, Y.; Li, G.; Zhang, J.; Qian, D. Prediction of the strength of rubberized concrete by an evolved random forest model. Adv. Civ. Eng. 2019. [Google Scholar] [CrossRef] [Green Version]
  40. Bingöl, A.F.; Tortum, A.; Gül, R. Neural networks analysis of compressive strength of lightweight concrete after high temperatures. Mater. Des. 2013, 52, 258–264. [Google Scholar] [CrossRef]
  41. Duan, Z.H.; Kou, S.C.; Poon, C.S. Prediction of compressive strength of recycled aggregate concrete using artificial neural networks. Constr. Build. Mater. 2013, 40, 1200–1206. [Google Scholar] [CrossRef]
  42. Chou, J.S.; Pham, A.D. Enhanced artificial intelligence for ensemble approach to predicting high performance concrete compressive strength. Constr. Build. Mater. 2013, 49, 554–563. [Google Scholar] [CrossRef]
  43. Chou, J.S.; Tsai, C.F.; Pham, A.D.; Lu, Y.H. Machine learning in concrete strength simulations: Multi-nation data analytics. Constr. Build. Mater. 2014, 73, 771–780. [Google Scholar] [CrossRef]
  44. Azim, I.; Yang, J.; Javed, M.F.; Iqbal, M.F.; Mahmood, Z.; Wang, F.; Liu, Q.f. Prediction model for compressive arch action capacity of RC frame structures under column removal scenario using gene expression programming. Structures 2020, 25, 212–228. [Google Scholar] [CrossRef]
  45. Pala, M.; Özbay, E.; Öztaş, A.; Yuce, M.I. Appraisal of long-term effects of fly ash and silica fume on compressive strength of concrete by neural networks. Constr. Build. Mater. 2007, 21, 384–394. [Google Scholar] [CrossRef]
  46. Anaconda Inc. Anaconda Individual Edition, Anaconda Website. 2020. Available online: https://www.anaconda.com/products/individual (accessed on 27 September 2020).
  47. Downloads, (n.d.). Available online: https://www.gepsoft.com/downloads.htm (accessed on 27 September 2020).
  48. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  49. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
  50. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock market index using fusion of machine learning techniques. Expert Syst. Appl. 2015, 42, 2162–2172. [Google Scholar] [CrossRef]
  51. Jiang, H.; Deng, Y.; Chen, H.S.; Tao, L.; Sha, Q.; Chen, J.; Tsai, C.J.; Zhang, S. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes BMC Bioinform. BMC Bioinform. 2004, 5. [Google Scholar] [CrossRef] [Green Version]
  52. Prasad, A.M.; Iverson, L.R.; Liaw, A. Newer classification and regression tree techniques: Bagging and random forests for ecological prediction. Ecosystems 2006, 9, 181–199. [Google Scholar] [CrossRef]
  53. Ferreira, C. Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. 2001. Available online: http://www.gene-expression-programming.com (accessed on 29 March 2020).
  54. Behnood, A.; Golafshani, E.M. Predicting the compressive strength of silica fume concrete using hybrid artificial neural network with multi-objective grey wolves. J. Clean. Prod. 2018, 202, 54–64. [Google Scholar] [CrossRef]
  55. Getahun, M.A.; Shitote, S.M.; Gariy, Z.C.A. Artificial neural network based modelling approach for strength prediction of concrete incorporating agricultural and construction wastes. Constr. Build. Mater. 2018, 190, 517–525. [Google Scholar] [CrossRef]
  56. Project Jupyter, Project Jupyter, Home. 2017. Available online: https://jupyter.org/ (accessed on 27 September 2020).
  57. Gholampour, A.; Gandomi, A.H.; Ozbakkaloglu, T. New formulations for mechanical properties of recycled aggregate concrete using gene expression programming. Constr. Build. Mater. 2017, 130, 122–145. [Google Scholar] [CrossRef]
  58. Gandomi, A.H.; Babanajad, S.K.; Alavi, A.H.; Farnam, Y. Novel approach to strength modeling of concrete under triaxial compression. J. Mater. Civ. Eng. 2012, 24, 1132–1143. [Google Scholar] [CrossRef]
  59. Gandomi, A.H.; Roke, D.A. Assessment of artificial neural network and genetic programming as predictive tools. Adv. Eng. Softw. 2015, 88, 63–72. [Google Scholar] [CrossRef]
  60. Frank, I.; Todeschini, R. The data analysis handbook. Data Handl. Sci. Technol. 1994, 14, 1–352. [Google Scholar] [CrossRef]
  61. Alavi, A.H.; Ameri, M.; Gandomi, A.H.; Mirzahosseini, M.R. Formulation of flow number of asphalt mixes using a hybrid computational method. Constr. Build. Mater. 2011, 25, 1338–1355. [Google Scholar] [CrossRef]
  62. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
  63. Öztaş, A.; Pala, M.; Özbay, E.; Kanca, E.; Çaǧlar, N.; Bhatti, M.A. Predicting the compressive strength and slump of high strength concrete using neural network. Constr. Build. Mater. 2006, 20, 769–775. [Google Scholar] [CrossRef]
  64. Singh, B.; Singh, B.; Sihag, P.; Tomar, A.; Sehgal, A. Estimation of compressive strength of high-strength concrete by random forest and M5P model tree approaches. J. Mater. Eng. Struct. JMES 2019, 6, 583–592. Available online: http://revue.ummto.dz/index.php/JMES/article/view/2020 (accessed on 21 August 2020).
Figure 1. Hex contour graph of input parameters; (a) Cement; (b) Coarse aggregate; (c) Fine aggregate; (d) Super plasticizer; (e) Water; (f) Compressive strength.
Figure 1. Hex contour graph of input parameters; (a) Cement; (b) Coarse aggregate; (c) Fine aggregate; (d) Super plasticizer; (e) Water; (f) Compressive strength.
Applsci 10 07330 g001
Figure 2. Model evaluation (a) Ensemble model with 20 submodels; (b) validation based on RF; (c) testing based on RF; (d) error distribution of the testing set.
Figure 2. Model evaluation (a) Ensemble model with 20 submodels; (b) validation based on RF; (c) testing based on RF; (d) error distribution of the testing set.
Applsci 10 07330 g002
Figure 3. Expression tree of high strength concrete (HSC) using gene expression.
Figure 3. Expression tree of high strength concrete (HSC) using gene expression.
Applsci 10 07330 g003
Figure 4. Model evaluation (a) Validation results of data based on GEP; (b) testing results of data; (c) normalized range of data.
Figure 4. Model evaluation (a) Validation results of data based on GEP; (b) testing results of data; (c) normalized range of data.
Applsci 10 07330 g004
Figure 5. Distribution of data with error range.
Figure 5. Distribution of data with error range.
Applsci 10 07330 g005
Figure 6. Model evaluation with errors (a) RF regression analysis; (b) error distribution based on the RF model; (c) decision tree (DT) regression analysis; (d) error distribution based on DT; (e) artificial neural network (ANN) regression analysis; (f) error distribution based on ANN; (g) GEP regression analysis; (h) error distribution based on GEP.
Figure 6. Model evaluation with errors (a) RF regression analysis; (b) error distribution based on the RF model; (c) decision tree (DT) regression analysis; (d) error distribution based on DT; (e) artificial neural network (ANN) regression analysis; (f) error distribution based on ANN; (g) GEP regression analysis; (h) error distribution based on GEP.
Applsci 10 07330 g006aApplsci 10 07330 g006b
Figure 7. Permutation analysis of input variables (a) model base (b) contribution of input variables.
Figure 7. Permutation analysis of input variables (a) model base (b) contribution of input variables.
Applsci 10 07330 g007
Table 1. Algorithm used in prediction properties of high strength concrete.
Table 1. Algorithm used in prediction properties of high strength concrete.
PropertiesData PointsAlgorithmReferences
Compressive strength, Slump test187ANN[7]
Elastic modulus159ANN[8]
Elastic modulus159FUZZY[9]
Elastic modulus159SVM[10]
Elastic modulus159ANFIS and nonlinear[11]
Compressive strength20ANN[12]
Compressive strength324ELM[13]
Compressive strength357GEP[14]
Table 2. Statistical description of all data points used in model (Kg/m3).
Table 2. Statistical description of all data points used in model (Kg/m3).
ParametersCementFine/Coarse AggregateWaterSuperplasticizer
Mean384.340.96173.562.34
Standard Error4.920.010.820.14
Median3600.921701.25
Mode3601.011701
Standard Deviation93.000.2615.562.69
Sample Variance8650.500.06242.197.24
Kurtosis0.366.4515.592.88
Skewness0.142.122.451.79
Range4401.86170.0812
Minimum1600.231320
Maximum6002.1302.0812
Sum137,212.84344.0761,963.8837.61
Count357357357357
Table 3. Statistical description of training data points used in the model (Kg/m3).
Table 3. Statistical description of training data points used in the model (Kg/m3).
ParametersCementFine/Coarse AggregateWaterSuperplasticizer
Mean383.290.97173.722.42
Standard Error6.060.011.080.17
Median3600.921701.37
Mode3201.011701
Standard Deviation95.950.2717.172.74
Sample Variance9206.570.07295.077.54
Kurtosis0.605.8214.422.96
Skewness0.192.082.481.82
Range4201.86170.0812
Minimum1800.231320
Maximum6002.1302.0812
Sum95,823.1242.7943,431.75606.43
Count250250250250
Table 4. Statistical description of testing data points used in the model (Kg/m3).
Table 4. Statistical description of testing data points used in the model (Kg/m3).
ParametersCementFine/Coarse aggregateWaterSuperplasticizer
Mean387.040.92172.181.98
Standard Error12.460.021.340.33
Median4000.901701
Mode3600.751701
Standard Deviation95.760.1810.352.55
Sample Variance9170.560.03107.256.55
Kurtosis0.226.820.184.75
Skewness0.171.660.332.19
Range4401.2245.212
Minimum1600.58154.80
Maximum6001.8020012
Sum22,835.5454.3810,159.18117.09
Count54545454
Table 5. Statistical description of validate data points used in the model (Kg/m3).
Table 5. Statistical description of validate data points used in the model (Kg/m3).
ParametersCementFine/Coarse AggregateWaterSuperplasticizer
Mean390.520.90173.072.10
Standard Error12.580.021.210.34
Median3780.901751
Mode3601.041800.5
Standard Deviation89.860.158.672.47
Sample Variance8076.290.0275.216.11
Kurtosis1.080.52−0.182.17
Skewness0.170.61−0.621.65
Range4400.7338.3210.5
Minimum1600.661540
Maximum6001.39192.3210.5
Sum19,916.8746.348826.8107.57
Count55555555
Table 6. Input parameters assigned in the gene expression programming (GEP) model.
Table 6. Input parameters assigned in the gene expression programming (GEP) model.
ParametersSettings
General f c
Genes4
Chromosomes30
Linking functionAddition
Head size10
Function set+, −, ×, ÷
Numerical constants
Constant per gene10
Lower bound−10
Data typeFloating number
Upper bound10
Genetic Operators
Two-point recombination rate0.00277
Gene transposition rate0.00277
Table 7. Random forest (RF) statistical analysis.
Table 7. Random forest (RF) statistical analysis.
ModelRMSEMAER2
FcValidationTestingValidationTestingValidationTesting
1.221.420.4750.4950.9670.041
RRMSERSEP(row)
ValidationTestingValidationTestingValidationTesting
0.01860.0210.0720.0530.0240.025
Table 8. Statistical calculations of the proposed model.
Table 8. Statistical calculations of the proposed model.
ModelRMSEMAERSE
FcValidationTestingValidationTestingValidationTesting
1.421.620.5750.5950.0920.023
RRMSERP(row)
ValidationTestingValidationTestingValidationTesting
0.02860.0310.9570.0310.0140.015
Table 9. Statistical analysis of RF and GEP models from external validation.
Table 9. Statistical analysis of RF and GEP models from external validation.
S.NoEquationConditionRF ModelGEP Model
1 k = i = 1 n ( e i × m i ) e i 2 0.85 < k < 1.15 0.990.98
2 k = i = 1 n ( e i × m i ) m i 2 0.85 < k < 1.15 1.001.00
3 R o 2 i = 1 n ( m i e i o ) 2 i = 1 n ( m i m i o ) 2 ,   e i o = k × m i R o 2 1 0.990.97
4 R o 2 i = 1 n ( e i m i o ) 2 i = 1 n ( e i e i o ) 2 ,   m i o = k × e i R o 2 1 0.990.99
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Farooq, F.; Nasir Amin, M.; Khan, K.; Rehan Sadiq, M.; Faisal Javed, M.; Aslam, F.; Alyousef, R. A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC). Appl. Sci. 2020, 10, 7330. https://doi.org/10.3390/app10207330

AMA Style

Farooq F, Nasir Amin M, Khan K, Rehan Sadiq M, Faisal Javed M, Aslam F, Alyousef R. A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC). Applied Sciences. 2020; 10(20):7330. https://doi.org/10.3390/app10207330

Chicago/Turabian Style

Farooq, Furqan, Muhammad Nasir Amin, Kaffayatullah Khan, Muhammad Rehan Sadiq, Muhammad Faisal Javed, Fahid Aslam, and Rayed Alyousef. 2020. "A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC)" Applied Sciences 10, no. 20: 7330. https://doi.org/10.3390/app10207330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop