Introduction

A major concern in the field of energy management is to have accurate estimations of the electricity demand. Accordingly, a wide variety of methods have been used by researchers to tackle this issue. In this context, autoregressive integrated moving average (ARIMA) is one of the most widely used approaches. This method is usually utilized for the modeling of stochastic disturbances in time series analysis (Box and Jenkins 1970). The ARIMA technique has been applied to different energy problems such as prediction of the customer short-term load and domestic electric energy consumption (Cho et al. 1995; Abdel-Aal and Al-Garni 1997; Ediger and Akar 2007). Moreover, several researchers used double seasonal ARIMA for electricity demand forecasting (Suhartono and Endharta 2009; Mohamed et al. 2010; Taylor 2003, 2010). Besides, conventional approaches such as multiple linear regression (MLR) are widely used for the energy demand estimation (Kandananond 2011). However, ARIMA, MLR, and other statistical analyses are based on defining the linear or nonlinear structure of the model in advance, which is not always true (Alavi and Gandomi 2011a; Mostafavi et al. 2013a).

Soft computing techniques are considered as alternative to traditional methods for tackling real-world problems. They automatically learn from experience and extract various discriminators (Mitchell 1997). Artificial neural networks (ANNs) are one of the widely used branches of soft computing. ANNs and other soft computing techniques have been successfully applied to different real world problems including industrial engineering and energy management (e.g., Abdel-Aal et al. 1997; Miranda et al. 1998; Pino et al. 2000, 2008; Annunziato et al. 2004, 2006; Tien Pao 2007; Srinivasan 2008; Tsujimura et al. 1997; Abdel-Aal 2008; Moghrabi and Eid 1998; Currie 1992; Daim et al. 2010; Alavi and Gandomi 2011b; Pizzuti et al. 2013; Annunziato et al. 2013; Najafzadeh et al. 2013; Najafzadeh and Lim 2014; Najafzadeh and Azamathulla 2013a, b; Kaydani et al. 2014; Najafzadeh et al. 2012, 2014a, b). ANNs have been used to predict the electricity demand in different countries such as Taiwan (Hsu and Chen 2003), Ireland (Ringwood et al. 2001), Spain (Catalao et al. 2007), Saudi Arabia (Abdel-Aal 2008), and Iran (Azadeh et al. 2007, 2008a, b; Kheirkhah et al. 2013). Support vector machine (SVM) is another soft computing technique that has been successfully applied to predict the electric consumption (Fan and Chen 2006; Hong 2009). Despite their good performance, ANNs and SVM are considered as black-box models. That is, they are not capable of generating practical prediction equations. The structure of ANNs should be defined in advance, which limits their practicability (Alavi and Gandomi 2011a; Yang et al. 2012; Mostafavi et al. 2013a, b).

In order to cope with the limitations of the existing methods, a robust soft computing approach, namely genetic programming (GP) is introduced (Koza 1992). In fact, GP uses the principle of Darwinian natural selection to generate computer programs for solving a problem. GP has several advantages over the conventional and ANN techniques. A notable feature of GP is that it can produce practical prediction equations without a need to pre-define the form of the existing relationship (Tay and Ho 2008; Alavi et al. 2011a, b; Can and Heavey 2011, 2012; Gandomi and Alavi 2011, 2013; Alavi and Gandomi 2012; Mostafavi et al. 2013b). GP and its variants have been shown to be powerful tools for the electricity demand prediction (Lee et al. 1997; Bhattacharya et al. 2001). Multi-gene genetic programming (MGGP) (Searson et al. 2007, 2010) is a robust variant of GP. MGGP is designed to generate mathematical models of predictor response data that are “multi-gene” in nature, i.e., linear combinations of low-order nonlinear transformations of the input variables. The traditional GP representation is based on the evaluation of a single tree (model) expression. In multi-gene representation, a single GP individual (program) is constructed from a number of genes, each of which is a tree expression (Searson et al. 2010; Gandomi and Alavi 2012a, b). Despite remarkable prediction capabilities of the MGGP approach (Searson et al. 2010; Gopalakrishnan et al. 2010; Desai and Shaikh 2012; Gandomi and Alavi 2012a, b), it has not been yet applied to solving problems in the field of energy conversion and management.

This study proposes a new MGGP approach to derive a prediction model for the electricity demand. The data for total electricity demand in Thailand from 1986 to 2009 were used for the model development. The results provided by the developed MGGP model were further compared with those obtained by other existing methods. The paper is organized as follows: The “Multi-gene genetic programming” section presents brief descriptions of the MGGP technique. The “Methodology” section outlines the model development using MGGP and reviews the results. The detailed performance analysis of the proposed model is discussed in the “Performance analysis” section. The results of the sensitivity analysis are given in the “Sensitivity analysis” section. Finally, concluding remarks are outlined in the “Conclusion” section.

Multi-gene genetic programming

GP creates computer programs to solve a problem by simulating the biological evolution of living organisms (Koza 1992). The genetic operators of genetic algorithm (GA) and GP are almost the same. The difference between GA and GP is that the former gives the solution as a string of numbers while the solution generated by the latter are computer programs represented as tree structures (Koza 1992; Gandomi and Alavi 2012a, b).

Figure 1 shows GP in the context of the input-processing-output (IPO) model (Weise 2009). As it is seen, inputs and corresponding output data samples are known in GP and the main goal is to find a program that connects them. In GP, a random population of computer programs is created to obtain high diversity. Each program evolved by GP is a structured tree composed of functions (e.g., +, −, ×, /, etc.) and terminals (e.g., numerical constants, logical constants, variables, etc.). The tree-like structure of a GP model is constructed by randomly choosing the functions and terminals. This structure has a root point with branches extending from each function and ending in a terminal. An example of a simple tree representation of a GP program is demonstrated in Fig. 2. Comprehensive descriptions of the GP algorithm can be found in Koza (1992); Alavi and Gandomi (2011a); Gandomi and Alavi (2012a, b).

Fig. 1
figure 1

GP in the context of the input-processing-output model

Fig. 2
figure 2

Tree representation of a GP model

MGGP (Searson et al. 2007, 2010; Searson 2009) is a new variant of GP. The traditional GP representation is based on the evaluation of a single tree (model) expression. In MGGP, a single GP individual (program) is derived from a number of genes, each of which is a tree expression (Searson et al. 2007, 2010; Searson 2009). In other words, each model evolved by MGGP is a weighted linear combination of the outputs from a number of GP trees. The tress are called “gene”. Figure 3 shows a typical program evolved by MGGP. The inputs of the model are a, b, and c and the functions used for the evolution process are ×, −, +, Log, and √. The model is linear in the parameters with respect to the coefficients α0, α1, and α 2 despite using nonlinear terms. As it is seen, the evolved model is a linear combination of nonlinear transformations of the predictor variables (Searson et al. 2007, 2010; Gandomi and Alavi 2012a, b). Two important MGGP parameters that need notable control are the maximum allowable number of genes and maximum tree depth. Restricting the tree depth mostly results in generating more compact models (Searson et al. 2007, 2010; Gandomi and Alavi 2012a, b).

Fig. 3
figure 3

A typical program evolved by MGGP

In order to obtain the linear coefficients an ordinary least squares analysis is performed on the training data. Besides, it is possible to embed multi-gene approach within a partial least squares method (Searson et al. 2007, 2010; Gandomi and Alavi 2012a, b). The initial population generated by MGGP contains GP trees with different randomly generated genes. In addition to traditional GP’s recombination operators, MGGP uses a tree crossover operator, called two-point high-level crossover to acquire and delete the genes (Searson et al. 2007, 2010; Gandomi and Alavi 2012a, b). As an example, assume that two parent programs evolved by MGGP contain two (Gene 1 Gene 2) and three genes (Gene 3 Gene 4 Gene 5). The genes enclosed by the crossover points are denoted by {} as follows: (Gene 1 {Gene 2}) and (Gene 3 {Gene 4 Gene 5}). Thus, during the crossover operation the genes are exchanges to create two new programs: (Gene 1 Gene 4 Gene 5) and (Gene 3 Gene 2). In MGGP, standard GP sub-tree crossover is referred to as low level crossover. In this case, a gene is chosen at random from each parent individual. Then, the standard sub-tree crossover is applied and the created trees replace the parent trees in the unaltered individual in the next generation. Moreover, there are different types of mutation in MGGP such as sub-tree mutation, mutation of constants using an additive Gaussian perturbation, and set a randomly selected constant to zero. Further details about MGGP can be found in (Searson et al. 2007, 2010; Gandomi and Alavi 2012a, b).

Methodology

The steps followed by the soft computing techniques to find optimal models are generally similar. A methodology similar to that successfully used in previously published studies was considered to derive a precise MGGP-based prediction model for the electricity demand (Azadeh et al. 2008b; Mostafavi et al. 2013b). The steps followed to derive the model were as follows:

  1. I.

    The input variables affecting the electricity demand were selected.

  2. II.

    Annual energy data of Thailand from 1986 to 2009 were collected.

  3. III.

    The gathered database was divided in to training and testing data. The training data were taken for the learning process. The testing data were used to measure the performance of the models obtained by MGGP on data that played no role in building the models (model validation) (Mostafavi et al. 2013b).

  4. IV.

    MGGP was run on the training data to find a computer program that connects the input variables to the output (annual electricity demand).

  5. V.

    The best MGGP model was chosen considering both its simplicity and the best performance on the training data.

  6. VI.

    The best MGGP model was run for the testing data to prove its generalization capability when dealing with unseen data in its future applications.

Figure 4 shows the steps of the proposed methodology for developing a prediction model for the electricity demand.

Fig. 4
figure 4

Steps of the proposed methodology for developing a prediction model for the electricity demand

The performance measures used for in this study were correlation coefficient (R), root mean squared error (RMSE), and mean absolute percent error (MAPE):

$$ R=\frac{\left({\displaystyle {\sum}_{i=1}^n\left({h}_i-\overline{h_i}\right)\left({t}_i-\overline{t_i}\right)}\right)}{\sqrt{{\displaystyle {\sum}_{i=1}^n{\left({h}_i-\overline{h_i}\right)}^2}{\displaystyle {\sum}_{i=1}^n{\left({t}_i-\overline{t_i}\right)}^2}}} $$
(1)
$$ \mathrm{RMSE}=\sqrt{\frac{{\displaystyle \sum_{i=1}^n{\left({h}_i-{t}_i\right)}^2}}{n}} $$
(2)
$$ \mathrm{MAPE}=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left[\frac{\left|{h}_i-{t}_i\right|}{h_i}\right]} $$
(3)

where, h i and t i are, respectively, the actual and predicted output values for the ith output, \( \overline{h_i} \) and \( \overline{t_i} \) are, respectively, the average of the actual and predicted outputs, and n is the number of samples. The R value is not solely a descriptive indicator of prediction accuracy. This is because by shifting the output values of a model equally, R would not change. That is why the RMSE and MAPE measures were also included for the performance evaluation. Evidently, higher R values and lower RMSE and MAPE values indicate a more precise model (Gandomi et al. 2011a).

The medium- and long-term electricity demand prediction models have a wide variety of application including evaluation of the capacity of generation, transmission, and the type of facilities required in transmission expansion planning, power plant construction scheduling, etc. (Lee et al. 1997; Henriksson et al. 2013; Mostafavi et al. 2013a). This study was dedicated to presenting a new approach for the long-term electricity demand prediction. Four effective parameters were considered as inputs of the MGGP models to make a more comprehensive prediction of the electricity demand. Selection of the input parameters was based on their popularity in the literature (Hsu and Chen 2003; Ediger and Akar 2007; Kandananond 2011; Mostafavi et al. 2013a). Finally, the formulation of the annual electricity demand (E) (GWh) was considered to be as follows:

$$ E = f\left(P,\mathrm{G}\mathrm{D}\mathrm{P},\mathrm{S}\mathrm{I},\mathrm{T}\mathrm{R}\right) $$
(4)

where, P, GDP, SI, and TR represent the yearly values of the population, gross domestic product, stock index (SET index), and total revenue from exporting industrial products (export) (million baht), respectively.

Data preprocessing

The proposed model was developed upon a reliable database containing the energy data of Thailand from 1986 to 2009 (Kandananond 2011). Table 1 presents the descriptive statistics of the input and output variables. The proposed models are applicable to the ranges shown in this table.

Table 1 Descriptive statistics of the variables included in the analysis

Of the available data sets, 18 data vectors (75 %) were taken for the training process and the remaining 6 data sets (25 %) were used as the testing data. The selection strategy was based on the consistency of the parameters in the training and testing data sets with regard to some statistical parameters (Alavi et al. 2011a, b; Gandomi et al. 2011a, b).

MGGP-based formulation of electricity demand

Several runs were conducted to obtain the best parameterization for MGGP. Various parameters are involved in the MGGP predictive algorithm. These parameters are selected are based on both some previously suggested values (Searson et al. 2010; Gopalakrishnan et al. 2010; Desai and Shaikh 2012; Gandomi and Alavi 2012a, b) and after making several preliminary runs and observing the performance. The parameter settings are shown in Table 2. In this study, basic arithmetic operators and mathematical functions are utilized to get the optimum MGGP models. The number of programs in the population is set by the population size. The number of generation sets the number of levels the algorithm uses before the run terminates (Searson et al. 2010; Gandomi and Alavi 2012a, b). The proper number of population and generation often depends on the complexity of problems and on the number of possible solutions. A fairly large number of population and generations are tested to find models with minimum error. The programs are run until the runs automatically terminated. The maximum allowable number of genes in an individual and the maximum tree depth directly influence the size of the search space and the number of solutions explored within the search space (Searson et al. 2010; Gandomi and Alavi 2012a, b). The success of the MGGP algorithm usually increases with increasing these parameters. In this case, the complexity of the evolved function increases and the speed of the algorithm decreases. The allowable number of genes and tree depth are respectively set to optimal values as tradeoffs between the running time and the complexity of the evolved solutions (Gandomi and Alavi 2012a, b). There are 3 × 3 × 3 × 2 × 2 × 2 = 216 different combinations of the parameters. All of these parameter combinations are tested and two replications for each were carried out. Therefore, the overall number of optimal individual runs is equal to 216 × 2 = 432. GPTIPS toolbox (Searson 2009), in conjunction with subroutines coded in MATLAB, is used to implement MGGP. Fitness function evaluates the evolved expressions to designate the best encoded expressions (Gandomi and Alavi 2012a, b). The default GPTIPS multi-gene symbolic regression function is used to minimize the error (root mean squared error). The best MGGP models are chosen on the basis of the providing the best fitness value on the training data as well as the simplicity of the models (Gandomi and Alavi 2012a, b).

Table 2 Parameter settings for the MGGP algorithm

The optimal MGGP-based formulation of the electricity demand (E) is as follows:

$$ \begin{array}{cc}\hfill E\left(\mathrm{G}\mathrm{W}\mathrm{h}\right)=\hfill & \hfill 385.9+0.0002098\mathrm{P}-0.05322\mathrm{G}\mathrm{D}\mathrm{P}+0.05826\mathrm{T}\mathrm{R}\hfill \\ {}\hfill \hfill & \hfill -\frac{1.515\mathrm{S}\mathrm{I}\left(\mathrm{G}\mathrm{D}\mathrm{P}-15.66\mathrm{S}\mathrm{I}\right)}{10^7}\hfill \\ {}\hfill \hfill & \hfill +\frac{6.994\mathrm{G}\mathrm{D}\mathrm{P}\left(3\mathrm{G}\mathrm{D}\mathrm{P}+7.831\mathrm{S}\mathrm{I}-\mathrm{T}\mathrm{R}\right)}{10^{10}}\hfill \\ {}\hfill \hfill & \hfill +\frac{8.755\mathrm{P}\left(\mathrm{G}\mathrm{D}\mathrm{P}-\mathrm{T}\mathrm{R}\right)}{10^{10}}-\frac{\left(8.755\mathrm{S}\mathrm{I}\left(\mathrm{G}\mathrm{D}\mathrm{P}-\mathrm{T}\mathrm{R}\right)\right)}{10^{10}}\hfill \\ {}\hfill \hfill & \hfill \hfill \\ {}\hfill \hfill & \hfill \hfill \end{array} $$
(5)

in which, P, GDP, SI, and TR represent the inputs variables. Figure 5 shows the measured versus predicted E values using the MGGP model. The numbers of population, generations, genes, and head size for the optimal run were equal to 200, 200, 6, and 5, respectively. As it is seen, the performance of the model on the testing data is better than training data. Figure 6 shows the variation of the best (log values) and mean fitness with the number of generations. It can be observed from this figure that the fitness value decreases with increasing the number of generations. The best fitness is found at the 195th generation. The statistical significance of each of the three genes of the derived model is visualized in Fig. 7. According to Fig. 7, the weight of the bias term is higher than the other genes. Figure 7 depicts the degree of significance of each gene evaluated using p values. As it is seen, except for the bias and fourth gene (Gene 4), the contribution of the genes to explain variations in E is very high, as their relevant p values are very low and are near 0. The statistical significance of Gene 1, Gene 3, Gene 5, and Gene 6 is lower than the bias term and other genes.

Fig. 5
figure 5

Measured versus predicted electricity demand using the MGGP model a training data, b testing data

Fig. 6
figure 6

Variation of the best and mean fitness with the number of generations for MGGP

Fig. 7
figure 7

Statistical properties of the evolved MGGP model (on training data)

Fig. 8
figure 8

A comparison of the electricity demand predictions made by different models

Performance analysis

Smith (1986) suggested the following criteria for judging performance of a model:

  • if a model gives |R| > 0.8, a strong correlation exists between the predicted and measured values.

In all cases, the error values (e.g., RMSE, MAPE) should be at the minimum. It can be observed from Fig. 5 that the MGGP model provides very good predictions both for the training (R = 0.999, RMSE = 389.913, MAPE = 1.461) and testing (R = 0.997, RMSE = 317.968, MAPE = 0.425) data. Besides, new criteria recommended by Golbraikh and Tropsha (2002) were checked for external validation of the models on the validation data sets. It is suggested that at least one slope of regression lines (k or k') through the origin should be close to 1. It should be noted that k and k' are the slopes of regression lines between the regressions of actual (h i ) against predicted output (t i ) or t i against h i through the origin, i.e., h i  = k t i and t i  = k' h i , respectively. Also, the performance indexes of m and n should be lower than 0.1. Recently, Roy and Roy (2008) introduced a confirm indicator (R m) of the external predictability of models. For R m > 0.5, the condition is satisfied. Either the squared correlation coefficient (through the origin) between predicted and experimental values (Ro2), or the coefficient between experimental and predicted values (Ro′2) should be close to R 2, and to 1 (Alavi et al. 2011a, b). The considered validation criteria and the relevant results obtained by the models are presented in Table 3. As it is seen, the derived model satisfies the required conditions. The validation phase ensures the derived model is strongly valid.

Table 3 Statistical parameters of the MGGP model for the external validation

In order to have an idea about the predictive power of the MGGP model, its performance was compared with that of a conventional and two powerful soft computing-based models. For this aim, traditional regression (MLR) (Kandananond 2011), ANN (Mostafavi et al. 2013a), and hybrid genetic programming-simulated annealing (GSA) (Mostafavi et al. 2013a) models were considered. The MLR model proposed by Kandananond (2011) is as follows:

$$ E\left(\mathrm{G}\mathrm{W}\mathrm{h}\right)=-91411+0.00170\times P+0.00794\times \mathrm{G}\mathrm{D}\mathrm{P}-2.57\times \mathrm{S}\mathrm{I}+0.00114\times \mathrm{T}\mathrm{R} $$
(6)

The best ANN architecture for the estimation of E had one input layer with four arguments (P, GDP, SI, and TR), one output layer with 1 node providing the value of E and one hidden layer having 15 nodes. Log-sigmoid was adopted as the transfer function between the input-hidden and hidden-output layers. Also, the ANN model was built with a learning rate of 0.05 and trained for 1500 epochs (Mostafavi et al. 2013a). It should be noted that GSA and MGGP are totally different evolutionary approaches. In GSA, only a single computer program is initially created at random. Then, the essential role of SA in the integrated GP and SA algorithm is selection of new computer programs and optimizing the evolutionary process to find the optimal model (Alavi et al. 2010). On the other hand, MGGP is working based on a population of computer programs and follows the basic GP evolutionary process. The final model evolved by MGGP is a weighted linear combination of the outputs from a number of GP trees (computer programs). Figure 8 presents a comparison of the predictions made by the MGGP, MLR, ANN, and GSA models. It can be observed from this figure that the results provided by the proposed model are a significant improvement over those provided by the MLR model. Besides, the MGGP model performs superior to the ANN and GSA models. It was not possible to include the ARIMA model proposed by Kandananond (2011) in the comparative study because the predicted electricity demand values were not presented in that research. However, Kandananond (2011) found that the ANN model had a notably better performance than the ARIMA model. Since the MGGP model performs superior to the ANN model, it apparently outperforms the ARIMA model.

As it is seen, ANN provides good results but a significant limitation of this method is that it usually does not provide practical prediction equations. Thus, it is very difficult for practitioners to utilize and interpret an ANN model (Kandananond 2011). However, it should be noted that only if one hidden layer is considered for developing the ANN models, it is possible to convert them into a functional form. Even in this case, the derived equations are very complicated as they are based on all of the connection weights between the input, hidden, and output layers (Alavi and Gandomi 2011b). Another important point is that the MLR model provides acceptable predictions only for years 2008 and 2009. A rational reason for this behavior of the MLR model is that such conventional statistical analyses assume a linear relationship between the outcome and the predictor variables, which is not always true. In most cases, the best models developed using the commonly used statistical approaches are obtained after controlling only some equations established in advance. Thus, such models cannot efficiently consider the interactions between the dependent and independent variables. On the other hand, the trends shown in Fig. 8 confirm that the performance of the proposed MGGP model is very good for all years. This is because the best solutions provided by this technique are determined after controlling numerous preliminary models, even billions of linear and nonlinear models (Alavi and Gandomi 2011a; Alavi et al. 2011a, b; Mostafavi et al. 2013b). A notable limitation of GP and its variants such as MGGP is that these methods are parameter sensitive. The best models are usually obtained after a notable number of runs with different combinations of the parameters. However, this process may be optimized by using any form of optimally controlling the parameters of the run (e.g., GAs) (Alavi et al. 2010).

Sensitivity analysis

In order to evaluate the importance of the input parameters to the prediction of the electricity demand, their frequency values (Gandomi et al. 2011a, b) were obtained. A frequency value equal to 100 for an input indicates that this input variable has been appeared in 100 % of the best 30 programs evolved by MGGP (Gandomi et al. 2011a, b). The frequency values of the predictor variables are presented in Fig. 9. According to this figure, the electricity demand is notably sensitive to all of the considered predictor variables. However, the electricity demand seems to be more influenced by P and GDP compared to SI and TR. The results are in close agreement with those reported by other researchers (Mostafavi et al. 2013a).

Fig. 9
figure 9

Contributions of the predictor variables in the MGGP analysis

Conclusion

This study presents a novel application of MGGP for the empirical modeling of the electricity demand in Thailand based on historical data from 1986 to 2009. This case study illustrated the success of the MGGP technique for the prediction of the electricity demand. The validation of the derived model was verified using different criteria. The model has a very good performance on both training (R = 0.999, RMSE = 389.913, MAPE = 1.461) and testing data (R = 0.997, RMSE = 317.968, MAPE = 0.425). Besides, MGGP has produced better outcomes than the MLR and two soft computing methods (ANN and GSA). The main advantage of MGGP over traditional methods such as MLR and ARIMA is that there is no predefined function to be considered for the modeling of the electricity demand. As expected, the results of the sensitivity analysis indicate that the electricity demand is more affected by population and GDP. The model can be easily retrained and improved to make more accurate predictions for a wider range by including the data for other years. Further research can focus on identifying other predictor variables and incorporating them into the modeling process. For instance, visibility, water vapor pressure, and wind speed may directly be included into the analysis in addition to the parameters considered in this study.