1 Introduction

It is expected that the recent increase in the concentration of atmospheric carbon dioxide and global temperature will have a significant impact on precipitation patterns in both global and regional scales [1, 7, 40]. Like the other water cycle components, the precipitation process is highly complex owing to the stochastic attributes of its triggering factors, such as temperature, wind speed, and humidity. The level of uncertainty in precipitation forecasting is significantly higher than the other water cycle components such as temperature and streamflow [19, 38]. Consequently, the problem has been addressed in a variety of recent studies (e.g., [8, 22, 33, 39]). Many efforts have been made over the past decades to model precipitation patterns at specific locations. Using the theory of Markov chains and classic statistical models, observed precipitation series have been used to model and predict precipitating in different lead times [12, 43]. However, they are not well enough for long-lead-time forecasting owing to the highly nonlinear structure and presence of non-stationarity in the monthly or seasonal rainfall series [10, 15]. Nowadays, emerging artificial intelligent (AI) techniques such as artificial neural networks (ANNs), support vector machine (SVM), extreme learning machines, and genetic programming (GP) are from the AI techniques that were used for precipitation forecasting (e.g., [2, 11, 24, 34]). However, the techniques were criticized as a black box and the results were not precise enough, particularly in the predictions of drizzles or heavy rainstorms [25].

To increase predicting accuracy, the current pertinent literature showed an increasing trend in the implementation of hybrid AI models that combine different data pre- or post-processing approached with one or more AI techniques ([14, 18, 20, 32, 41, 44]. For example, Sivapragasam et al. [35] and Chau and Wu [4] coupled singular spectrum analysis with SVM and showed the hybrid model may perform better than ad hoc SVM and ANN for rainfall forecasting. In a recent study, Danandeh Mehr et al. [9] confirmed that the firefly optimization algorithm can be used to determination of SVM parameters and increase its predictive accuracy. More recently, Yaseen et al. [42] hybridized an adaptive neuro-fuzzy inference system model with the evolutionary algorithms including particle swarm, genetic algorithm, and differential evolution. The capability of the proposed hybrid models for precipitation forecasting was compared with the conventional neuro-fuzzy model. The results showed that all the hybrid models are suitable to improve the neuro-fuzzy inference system; however, the particle swarm optimization was superior.

Although the above-mentioned hybrid AI models may provide higher prediction accuracy than standalone ones, they are still black boxes possessing a complicated model structure in most cases. These are two main factors that caused demotivation for practitioners to apply such hybrid models in practice. Thus, additional efforts are needed to address the key issues of accuracy and complexity in precipitation forecasting models. The aim of this study is, therefore, to develop a new hybrid model that meets both precision and simplicity conditions. To this end, several linear and nonlinear integrations of GP and gene expression programming (hereafter GEP) are examined. The main contributions of this study are threefold. First, the present study, for the first time, investigates predictive capabilities of standalone GP and GEP for seasonal precipitation forecasting. These are emerging AI techniques with explicit tree structures typically known as grey-box techniques [16]. Thus, they overcome the ultimate black box feature of earlier models. Second, this study integrates linear and nonlinear regression techniques to develop an ensemble model that prioritizes both predictive accuracy and model simplicity. The proposed method can be used to forecast seasonal precipitation with an explicit tree structure. Finally, this is the first study that compares the ability of GP- and random forest (RF)-based models for long-term precipitation modeling. RF is a robust regression and classification technique that was considered as the benchmark in this study.

2 Study area and data preparation

The Antalya Province with a population of more than two million is located on the Mediterranean coast of southwest Turkey, between the Torus Mountains and the Mediterranean Sea (Fig. 1a). Modern precipitation measurement in Antalya has been commenced since 1929. The data used in this study are from Muratpasa station (36.9063 N, 30.7990 E) located in Antalya downtown (Fig. 1b, c). The daily precipitation data from January 1, 1930, till December 31, 2017, were collected from the Turkish State Meteorological Service and used in this study.

Fig. 1
figure 1

Location of the Antalya Province (a), meteorology station (b, c), and mean monthly precipitation for the period 1930–2017 (d)

The mean monthly precipitation for the period of 1930–2017 is depicted in Fig. 1d. Because of a pronounced variation in the mean monthly precipitation, conventional autoregressive time series modeling approaches could not be a wise choice for long-term precipitation forecasting in the Muratpasa station. This implies the Mediterranean climate that prevails in Antalya with hot and dry summers and wet autumns, winters, and springs. From a meteorological perspective, summer precipitations in the region are generally convective with high intensity and short duration. Significant differences in the seasonal distribution of the number of rainy days are also observed. While the long-term average of rainy days in the summer is about 4.0 days, the corresponding numbers for winter, spring, and autumn are 34, 22, and 15 days, respectively [5]. Owing to such a significant difference in the precipitation pattern between summer and wet seasons, this study aimed at the modeling and forecasting seasonal precipitation at wet seasons (autumn, winter, and spring) in which the evolved models merely uses historical data from the wet seasons. From a modeling point of view, this strategy may yield in the elimination of abundant zero values from the input vectors (predictors) and consequently will lead to more precise predictions for wet seasons. There are 88-year observed data with a total of 352 (88 × 4) seasons. Hence, this study is based on 264 (88 × 3) seasons. Time series plot of the observed total seasonal precipitation (TSP) and their statistical characteristics are presented in Fig. 2 and Table 1, respectively. For information about the socioeconomic development of the Antalya Province, the interested reader is referred to the recent paper by Özkubat and Selim [30].

Fig. 2
figure 2

Total seasonal precipitation observed in Antalya, Turkey

Table 1 Statistical characteristics of the total seasonal precipitation observed in rainy season at Antalya during the 1930–2017 period

3 Methodology

3.1 Determination of optimum predictors

Determination of optimum inputs is an important task in any system identification or hydrological modeling problem that dictates the accuracy/complexity of evolved models. In previous studies, different input combinations were examined via trial and error method to determine suitable predictors (e.g., [28, 42]). However, this is time-consuming, and the modeler might not achieve the best inputs if it is not considered as the potential predictor in advance. In this study, the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the TSP series were considered to determine optimum inputs (Fig. 3). The seasonal oscillation pattern is seen in the ACF diagram. As expected, the partial correlation markedly decreased after nine lags. These patterns point to the selection of nine lags as expressed in Eq. (1). The structure optimization features of GP, GEP, and RF are used to select the most effective inputs in the present study.

$$P_{t} = f \left( {P_{t - 9} ,P_{t - 6} ,P_{t - 3 ,} ,P_{t - 2} ,P_{t - 1} } \right),$$
(1)

where Pt is TSP amount at the season t.

Fig. 3
figure 3

Correlogram of the observed total seasonal precipitation

3.2 Multiple linear regression (MLR)

The MLR is the advancement for linear regression method in which the dependent variable (Pt) is linearly regressed to the selected five lagged TSP values as shown in Eq. (2). The m-values are coefficients corresponding to each parameter, and b is a constant value. Indeed, the attained equation is a straight line that best fits the TSP data and can be used to predict the target value based on unseen independent variables within their initial ranges. The unknown coefficients and the constant value can be calculated using the “least squares” method. The task can be easily done using the LINEST function in MS Excel.

$$P_{t} = m_{1} P_{t - 9} + m_{2} P_{t - 6} + m_{3} P_{t - 3} + m_{4} P_{t - 3} + m_{5} P_{t - 1} + b.$$
(2)

3.3 Overview of classic GP

GP [21] is an emerging evolutionary AI technique frequently used for symbolic regression problems. The best solution is an individual computer program which generated on the basis of the principle of the survival of the fittest [36]. In classic GP, each individual has a form of a tree, called the genome. For example, Fig. 4 represents a genome and corresponding mathematical equation encompassing a root node (addition), inner nodes of multiplication and addition, and terminal nodes of the independent variables (x1 and x2) and random numbers (R1 and R2). Each node in a genome accepts only a specific type(s) of elements. While the root node merely accepts a function (logical, mathematical, etc.), the inners can be fill by a function or independent variables. Therefore, creating a set of appropriate functions and variables is of the most important tasks of the GP modelers. In order to avoid overfitting problem and keep away from intricate solutions, use of low depth genomes has been suggested [17]. A wise decision on these factors not only supports the algorithm to achieve more accurate solutions but also reduce computation cost.

Fig. 4
figure 4

A tree-shaped genomes and corresponding mathematical representations

To solve a regression problem using GP, the algorithm starts with the formation of the initial population of potential solutions using the modeler defined functions/variables. The individuals that show better fit between the dependent and independent variables are considered as parents to produce offspring. Typically, three evolutionary operators including Reproduction, Crossover, and mutation are acted on the parents [6]. The reproduction is the relocating the best individual into the new set of offspring without any transform. The process commences with the goodness of fit assessment of initial programs and ceases after the identification of the best fitted one. The crossover is the exchange of branches/nodes between two high-performance individuals that results in two offspring [6]. Figure 5 illustrates the process in which a branch of parent 1 (red nodes) is replaced with a branch of parent 2 (blue nodes). The offspring are solutions that possess the genetic materials of their parents. Many studies have shown that offspring fit to the training set better than their parents [16]. As illustrated in Fig. 6, the mutation is the random change of branches/nodes (i.e., genetic materials) in a single parent at the mutation point. In the figure, the branch x1 × sin x2 in the parent was replaced with the randomly created Log x2 in the offspring. All the above-mentioned operations are repeated till an individual displays the desired level of accuracy in both training and holdout testing data sets.

Fig. 5
figure 5

An example of crossover operation

Fig. 6
figure 6

Mutation operation acts on a GP chromosome

3.4 Overview of GEP

GEP is a GP variant that uses fixed-length linear genome, aka chromosomes, to represent computer programs in the form of expression trees [13]. It is trained with the same evolutionary process acting on a population of initial chromosomes of different sizes and shapes. The parents are selected according to fitness and genetic operators are implemented to improve their fitness. A linking function (e.g., addition, multiplication, etc.) is used to join the evolved offspring and the best model is selected among the joint expressions. An example of a linear GEP chromosome and associated mathematical function is presented in Fig. 7. The chromosome has four types of functions (\({\text{Sin}} , + , - , \times , \div\)) and four terminals (X1, X2, 3, 2), where X1 and X2 are the input variables, and 3 and 2 are random numbers. the multiplication function between the parenthesis (i.e., sub-ETs) is called linking functions.

Fig. 7
figure 7

An example of GEP chromosome representing \(\left( {{\text{Sin}} \left( {X_{1} + X_{2} } \right) - X_{1} } \right)\frac{{2X_{2} }}{{X_{1} }}\)

As illustrated in Fig. 7, the chromosome comprises two parts: the head (blue panel) which can take members from both functions and terminals, and the tail (green panel) which is formed merely with terminals. According to Ferreira [13], the length of the head h is chosen by the modeler, and then Eq. (3) is used to determine the length of the tail length t.

$$t = h \times \left( {n - 1} \right) + 1,$$
(3)

where n is the maximum arity of all predefined functions. For details about evolutionary operations in GP and GEP, the reader is referred to Danandeh Mehr et al. [6].

3.5 The proposed ensemble genetic programming (EGP) model

The proposed EGP is a hybrid model (Fig. 8) that aimed at increasing the accuracy of standalone classic multiple and symbolic evolutionary regression models. While the standalone regression models map the dependent variables to the independent variable using either the method of least squares or evolutionary optimization technique, the new model improves the predictive accuracy through the combination of both tools. As illustrated in Fig. 8, the proposed model includes three stages of data preparation (Stage 1), standalone modeling (Stage 2), and ensemble modeling (Stage 3).

Fig. 8
figure 8

Schematic of the proposed EGP model for seasonal precipitation forecasting

In the first stage, the historical data are gathered and preprocessed to determine input/output time series and to feed the standalone regression models. A classic linear regression model (i.e., MLR) together with two symbolic regression models (i.e., GP and GEP) was individually evolved to forecast Pt in the second stage. Finally, in the third stage (i.e., ensemble modeling unit), the results (outputs) obtained from the standalone modeling unit were integrated through three ensemble methods; (1) ordinary arithmetic mean, (2) linear ensemble, and (3) nonlinear ensemble techniques that, respectively, yields in ensemble mean model (Eq. 4; hereafter EMM), ensemble linear model (Eq. 5; hereafter ELM), and ensemble nonlinear model in which a classic GP is trained and verified to figure out optimum nonlinear structure (Eq. 6; hereafter EGP) for a season-ahead TSP forecasting.

$$\overline{{P_{t} }} = \frac{1}{3}\sum (P_{{t{\text{MLR}}}} \left( i \right) + P_{{t{\text{GP}}}} \left( i \right) + P_{{t{\text{GEP}}}} \left( i \right))$$
(4)
$$\overline{{P_{t} }} = m_{1} P_{{t{\text{MLR}}}} \left( i \right) + m_{2} P_{{t{\text{GP}}}} \left( i \right) + m_{3} P_{{t{\text{GEP}}}} \left( i \right) + c$$
(5)
$$\overline{{P_{t} }} = f\left( {P_{{t{\text{MLR}}}} \left( i \right),P_{{t{\text{GP}}}} \left( i \right),P_{{t{\text{GEP}}}} \left( i \right)} \right),$$
(6)

where \(\overline{{P_{t} }}\) is the ensemble model estimation, i represents the number of training data at each input/output vectors. The mj (in the present study j = 3) and c are, respectively, the weights of every single model and the constant component of the multivariable linear ensemble model. The optimum weights and constant are calculated by the least squares method via a multiple linear regression analysis in this study. Previous studies have proven that ensemble use of the outputs of several models improves the model accuracy [29, 36].

3.6 Overview of random forest (RF)

The RF [3] is an advancement of the conventional decision tree algorithm that refers to the creation of a set of random decision trees based on a random subset of variables from the bootstrapped data set. In RF, a wide variety of random trees are created that makes the algorithm more effective than traditional decision trees. To predict a target variable, the corresponding features are taken and run them down all the created random trees. Then, voting is done among the outputs to see which alternative received more votes. This strategy has been proven that performs well in both classification and regression problems and is robust against overfitting [26]. Therefore, the approach is selected as a benchmark in this study. For the fundamentals of RF and its applications in water resources, the reader is referred to the review paper conducted by Tyralis et al. [37]. Details of RF created for seasonal precipitation forecasting are presented in the results section.

3.7 Performance evaluation

To validate the models’ performance, Nash–Sutcliffe efficiency (NSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE) measures were used in this study. These indicators are commonly used for the justification of hydrological models (e.g., [23, 26])

$${\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} (X_{i}^{{{\text{obs}}}} - X_{i}^{{{\text{pre}}}} )^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} (X_{i}^{{{\text{obs}}}} - X_{{{\text{mean}}}}^{{{\text{obs}}}} )^{2} }}$$
(7)
$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} (X_{i}^{{{\text{obs}}}} - X_{i}^{{{\text{pre}}}} )^{2} }}{n}}$$
(8)
$${\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{X_{i}^{{{\text{obs}}}} - X_{i}^{{{\text{pre}}}} }}{{X_{i}^{{{\text{obs}}}} }}} \right|,$$
(9)

where \(X_{i}^{{{\text{obs}}}}\) is the observed TSP, \(X_{i}^{{{\text{pre}}}}\) is the TSP calculated by the models, and n is the number of samples.

4 Results

In this section, at first, the results of standalone modeling via MLR, GP, and GEP techniques are provided. Then, the outcomes from the ensemble modeling stage and discussion on different models are presented. The ensemble models are finally compared with an RF model developed as the benchmark. It is worth mentioning that two software packages, namely GPdotNET [16] and GeneXproTools [13] were applied in this study to evolve the GP and GEP models, respectively. Both tools were trained and tested with the same input/target sample size, fitness function, and terminal sets. The best RF model was generated using RapidMiner Studio software.

In the evolution of AI models, particularly GP and GEP, dimensions and range of the inputs must be carefully adjusted before training the models [36]. As suggested by Tür [36], to achieve dimensionally truthful models, the TSP values were normalized into the range [0.1 and 0.9] using Eq. (10). Therefore, all the results presented in this paper are normal values.

$$P_{{{\text{nt}}}} = 0.8 \times \frac{{P_{t} - P_{{\min}} }}{{P_{{\max}} - P_{{\min}} }} + 0.1.$$
(10)

4.1 Results of standalone linear and nonlinear models

Table 2 presents the efficiency results of the best evolved MLR, GP, and GEP models. The corresponding explicit expressions are also presented in Eqs. (1113), respectively.

$$P_{t} = 0.271P_{t - 9} + 0.206P_{t - 6} + 0.205P_{t - 3} - 0.16P_{t - 2} - 0.155P_{t - 1} + 0.159$$
(11)
$$\begin{aligned} P_{t} & = (((\cos ((((\tan (P_{t - 9} )) \times (0.38016 - 0.38831)) \\ & \quad \times (\cos ((P_{t - 1} - 0.5559 )))))) \times ({\text{sqrt}}((P_{t - 9} \\ & \quad \times ({\text{sqrt}}((P_{t - 6} \times P_{t - 3} )))))))*(\cos ((P_{t - 9} \\ & \quad \times ({\text{sqrt}}((P_{t - 9} \times (0.38016/0.38831)))))))) \\ \end{aligned}$$
(12)
$$\begin{aligned} P_{t} & = [(0.0064((\exp ((0.2418/P_{t - 3} ))^{2} ) \times 0.2418))^{2} ] \\ & \quad + [(0.7612^{2} ) \times (\cos \left( {\left( {P_{t - 6} + P_{t - 3} } \right)} \right) \times (\left( {0.4608P_{t - 6} } \right) \\ & \quad + P_{t - 3} ))] + [((\cos \left( {P_{t - 9} } \right) \times \cos \left( {P_{t - 6} } \right)) \times (\left( {0.1566P_{t - 9} } \right) \\ & \quad + P_{t - 6} )) \times P_{t - 9} ]. \\ \end{aligned}$$
(13)
Table 2 Performance indices calculated for the standalone models

According to the efficiency results, the GEP model provides the highest accuracy comparing to its counterparts both in the training and testing periods. The standalone GP model is also superior to the MLR. This is not a surprising result as the former uses a nonlinear structure having more complicated functions which indicate the complex relationship between the TSP and its antecedent values. From a model complexity perspective, the evolved GEP model is the most complex one as it uses more functional nodes. To further assess the models, the scatter plots of forecasts vs observations are presented in Fig. 9. According to the figure, MLR overestimates the lower TSP values in the testing period. By contrast, the GP underestimates the corresponding values. Comparing to the MLR and GP results, the observed experiments and corresponding GEP estimations are distributed more closely to the 1:1 line indicating its superiority.

Fig. 9
figure 9

Scatter plots of the standalone model forecasts versus observed normalized TSP in the training (top panel) and testing (bottom panel) periods

4.2 Results of ensemble models

As previously mentioned in the ensemble modeling unit, the results of the MLR, GP, and GEP models are combined through two linear and one nonlinear schemes. Thus, the ensemble results are not sensitive to the limits of a standalone regression model. To attain optimum linear and nonlinear ensemble models in this stage, a new MLR and GP models are developed using the results of standalone models as the inputs. As in the standalone GP, the GPdotNET package was used in this stage where the algorithm was trained to minimize RMSE as the fitness function and the maximum depth of genes and best functions were obtained via a trial–error process.

The evolved ensemble GP models were also compared with the RF model developed as the benchmark in this study. Like ensemble GP models, RF is an ensemble of several tree structures. Here, several RF models were generated and the best one was selected. To this end, several random trees in the range of 2 to 1000 were generated with varying depth within the range 5 to 15. For each tree, a subset of examples was selected via bootstrapping and optimized with regard to the least square error criterion. The trials showed that the best model is attained with 50 trees, each of which has a depth of five. The results showed that a higher number of trees or depths increases the model performance in the training set insignificantly and no improvement occurred in the validation data. Table 3 compares the efficiency results of the best evolved EMM (Eq. 14), ELM (Eq. 15), and EGP (Fig. 10) models with the best evolved RF model.

$$P_{t} = \left( {P_{{t\_{\text{MLR}}}} + P_{{t\_{\text{GP}}}} + P_{{t\_{\text{GEP}}}} } \right)/3$$
(14)
$$P_{t} = 0.0729 P_{{t\_{\text{MLR}}}} + 0.1036P_{{t\_{\text{GP}}}} + 0.7746P_{{t\_{\text{GEP}}}} + 0.01.$$
(15)
Table 3 Performance indices calculated for the ensemble models
Fig. 10
figure 10

Optimum GP tree for the nonlinear ensemble model EGP

According to the results, the EGP performed better than the other ensemble models in terms of all the statistical efficiency measures. The improvement was more noticeable when the testing samples were considered. The EGP with the training (testing) NSE of 0.73 (0.67) is much more powerful in TSP forecasting than the ELM (EMM) model with NSE of 0.61 (0.56) and 0.57 (0.47), respectively. These values indicate that EGP is 20% and 28% superior to the ELM and EMM models, respectively. Comparing to the state-of-the-art RF model, the results indicated the superiority of the proposed ensemble EGP model. Lower generalizability of the RF was found as its main drawback in this study. As illustrated in Fig. 10, the EGP can be expressed in a single tree structure; however, the best RF, which is the combination of 50 trees, cannot be shown in a single tree.

Figure 11 exhibits the scatter plots of the ensemble models vs observed normalized TSP. As seen in the scatter plot, the EGP forecasts are closer to the 1:1 line that could predict the TSP better than the ELM and EMM models. Therefore, the predicted TSP values improved with this method and the forecasts are closer to the observed data. It is noted the EGP is more powerful than MLR, GP, GEP, and RF models in the TSP forecast. In this model, the normalized TSP is predicted only by the knowledge of the historical precipitation events. The model does not depend on the large-scale climate indicators, and therefore, it is motivating to be used by practitioners.

Fig. 11
figure 11

Scatter plots of the estimations of ensemble models vs. observed normalized TSP (top panel) and testing (bottom panel) data sets

Figure 12 illustrates the distribution of observed and ensemble models’ forecasted TSP over the testing period (1990–2014) through displaying their quantiles and skewness. The figure demonstrates that all the models produced the same median which has an average seasonal precipitation of 242.6 mm. The EGP is superior to its counterparts in forecasting the precipitation amount in low and extreme rainy seasons. By contrast, the RF model shows better performance in predicting within the interquartile range.

Fig. 12
figure 12

Boxplot of normalized observed and forecasted seasonal precipitation for Antalya. The boxes indicate the 25th, 50th, and 75th percentiles. The ends of the whiskers indicate the lowest (highest) TSP within 1.5 times the interquartile range of the lower (upper) quartile. Cross-mark denotes extreme precipitation season considered as outlier

5 Discussion

The results showed that combining the outcomes of classic and symbolic regression models may enhance the precision of the TSP forecasts. This is well in agreement with the findings of the previous studies such as Rahmani-Rezaeieh et al. [31] and Danandeh Mehr and Safari [10] where the ensemble GP approach was introduced as the robust modeling technique in hydrological studies. In comparison with simple averaging and linear ensemble, the forecasted values of TSP using the new EGP model were closer to those of observed data set in both training and testing samples.

As noted previously, each of the evolved models has specific advantages and shortcomings. The MLR is so simple and easy to develop, use, and interpret but it underestimates heavy rainfall seasons. The GP and GEP models always produce better forecasts than MLR. But they are more complex. The ensemble models lead to higher performance by combining the results so that the underestimation problem is solved. As can be seen from Tables 2 and 3, the ability of nonlinear ensemble method, i.e., EGP, is higher than that of linear and arithmetic averaging ensembles. In terms of NSE, the best nonlinear ensemble technique (EGP) improved the modeling efficiency of the best standalone technique (GEP) by 25% and 35% in the training and testing periods, respectively. It is worth mentioning that the performance results tabulated in Tables 2 and 3 are the average of the entire time series modeled and validated at the training and testing periods. Therefore, it is difficult to quantify the model’s performance in each season. To address this problem, the residuals in at each season were calculated and the sum of the normalized absolute error of each model in each season is depicted in Fig. 13. It is seen that all the models show better performance in spring followed by autumn.

Fig. 13
figure 13

Sum of the normalized absolute error of the ensemble models in different seasons

Returning to the relevant literature, it is seen that the implementation of AI-based models in precipitation forecasting has received more attention in recent years. However, the existing studies mostly focused on daily or monthly scenarios. The higher the modeling horizon, the lower the forecasting accuracy. For example, the standalone GEP model suggested by Mehr [25] for monthly rainfall forecasting in two rain gauge stations in Iran produces the maximum NSE values of 0.53 in the testing period. Such performance was considered as a good accuracy for monthly rainfall forecasting. For a seasonal forecasting scenario, such a level of accuracy can be considered as a desirable accuracy [28]. Thus, the proposed EGP model is an appropriate alternative for seasonal precipitation forecasting.

6 Summary and conclusion

A three-stage hybrid modeling approach was suggested for TSP prediction in this study. The model was evolved and verified using long-term observed precipitation from Antalya, Turkey. To this end, first, the MLR, GP, and GEP-based models were developed and compared with each other. Afterward, three ensemble techniques were employed to enhance forecasting accuracy. In the ensemble modeling stage, the outputs of the best evolved MLR, GP, and GEP methods were used as the TSP predictors. The results were three hybrid ensemble models, namely EMM, ELM, and EGP in which both linear and nonlinear combination schemes were considered. The GP and GEP models used in the present study are from emerging AI techniques that are known as grey-box models with explicit model structures like a tree. The standalone modeling stage showed that the ad hoc GP and GEP are not well enough for TSP forecasting. This agrees with findings in the relevant literature in which GEP was reported as an imprecise engine for monthly precipitation forecasting [25]. Comparing the efficiency results of the ensemble models with those of RF, it was shown that the EGP was the best ensemble model. It outperformed all the standalone and ensemble models that implies the combination of different modeling outputs could reduce uncertainties of individual models. The models evolved in the present study were limited to the historical TSP as the predictors. The evolution of similar models using large-scale climate indicators is suggested as a topic for future studies.