Elsevier

Energy Conversion and Management

Volume 84, August 2014, Pages 334-345
Energy Conversion and Management

The problem of multicollinearity in horizontal solar radiation estimation models and a new model for Turkey

https://doi.org/10.1016/j.enconman.2014.04.035Get rights and content

Highlights

  • Impacts of multicollinearity on solar radiation estimation models are discussed.

  • Accuracy of existing empirical models for Turkey is evaluated.

  • A new non-linear model for the estimation of average daily horizontal global solar radiation is proposed.

  • Estimation and prediction performance of the proposed and existing models are compared.

Abstract

Due to the considerable decrease in energy resources and increasing energy demand, solar energy is an appealing field of investment and research. There are various modelling strategies and particular models for the estimation of the amount of solar radiation reaching at a particular point over the Earth. In this article, global solar radiation estimation models are taken into account. To emphasize severity of multicollinearity problem in solar radiation estimation models, some of the models developed for Turkey are revisited. It is observed that these models have been identified as accurate under certain multicollinearity structures, and when the multicollinearity is eliminated, the accuracy of these models is controversial. Thus, a reliable model that does not suffer from multicollinearity and gives precise estimates of global solar radiation for the whole region of Turkey is necessary. A new nonlinear model for the estimation of average daily horizontal solar radiation is proposed making use of the genetic programming technique. There is no multicollinearity problem in the new model, and its estimation accuracy is better than the revisited models in terms of numerous statistical performance measures. According to the proposed model, temperature, precipitation, altitude, longitude, and monthly average daily extraterrestrial horizontal solar radiation have significant effect on the average daily global horizontal solar radiation. Relative humidity and soil temperature are not included in the model due to their high correlation with precipitation and temperature, respectively. While altitude has the highest relative impact on the average daily horizontal solar radiation, impact of temperature is greater than that of both longitude and precipitation.

Introduction

Solar energy is an important renewable energy source. It has essential effects on environmental processes. Therefore, it is directly related with indicators of development of a country, such as cereal yield, crop production, forest area, gas emissions, and energy production. Because it is possible to convert solar energy to electricity without contributing the increase in climate change or damaging water resources, it seems more beneficial than the other methods of electricity generation and attracts great attention [1]. Accurate estimation of the amount of solar radiation absorbed or reflected by certain points over the Earth is an appealing research area. Through the article, we refer “horizontal global solar radiation” as “solar radiation” unless otherwise is stated.

Scientists use various statistical modelling tools for estimation of the amount of solar radiation, and compare proposed models to identify the most accurate one. Because there are many meteorological and geographical variables, terrestrial factors, and many types of models that can be used in solar energy modelling, identification of an accurate model is a very complex task. Some of those variables are temperature, soil temperature, precipitation, relative humidity, cloudiness, cloud types, sunshine duration, longitude, latitude, altitude, evapotranspiration, the Earth’s distance from the sun, solar elevation angle, aerosol concentration in the atmosphere, ozone concentration, air pollution, and the ground albedo [2]. Linear, trigonometric, polynomial, or logical models with main effect and interaction terms can be used to model the amount of solar radiation reaching at a particular point over the Earth. A group of models includes large scale solar energy atlases such as European Solar Radiation Atlas (ESRA), the Solar Radiation Potential Atlas (GEPA), and the TEKNOLOGIS (see Demirhan et al. [3] for details). Each of these atlases has a specific estimation model and an algorithm, and covers large regions. Another group of models has been developed for a particular region of the Earth or for a limited time span. There are a lot of models for some cities of Turkey. Dinçer et al. [4] proposed a model for the site of Gebze, Turkey. Togrul and Onat [5] introduced a model for the estimation of solar radiation for Elazig, Turkey. Sen and Sahin [6] proposed a cumulative semivariogram approach for the month of January over 29 cities of Turkey. Saylan et al. [7] presented solar radiation estimates for three cities of Turkey. Sozen et al. [8] gave solar radiation estimates for 12 cities of Turkey by using the artificial neural networks (ANN). Sozen et al. [9], [10] proposed other models for Turkey over 17 meteorological stations using ANN. Sozen et al. [11] gave a modelling strategy over data of 18 cities of Turkey. Menges et al. [12] proposed a model for Konya, Turkey. Senkal and Kuleli [13] introduced a model for 12 cities of Turkey. Senkal [14] focused on nine cities of Turkey. Senkal et al. [15] gave solar radiation estimates for two cities of Turkey. Koca et al. [16] proposed a model for seven cities from the Mediterranean region of Turkey. Some models were proposed for specific countries. Jin et al. [17] proposed a generic model for the monthly average daily solar radiation for China. Ozgoren et al. [18] developed a model to estimate monthly mean daily sum solar radiation for Turkey. Khorasanizadeh and Mohammadi [19] focused on the region of Iran. It is possible to apply a model or modelling strategy developed for a particular region to another region of the Earth. Ertekin and Evrendilek [20] applied 18 existing models to estimate average daily solar radiation for Turkey. In an extensive and valuable work, Evrendilek and Ertekin [21] applied existing 78 models, in which 17 variables are considered, for the region of Turkey. This work also provides a comprehensive review of the models proposed for the estimation of the average daily horizontal global solar radiation. Evrendilek and Ertekin [21] identified successful models in terms of estimation accuracy for the region of Turkey. Sonmete et al. [22] considered existing 82 models for two cities of Turkey in a comparative case study.

Statistical modelling is a mechanical work. Almost each kind of model has its own assumptions. Violations of model assumptions are effectual on the significance tests of model components, accuracy of parameter estimates, and model selection tools. Distributional assumptions, influential observations, and possible multicollinearity structures between exploratory variables should be regarded for a successful modelling task. The main problem in solar radiation estimation models is the multicollinearity. When a variable is seen in a model more than once as in polynomial models, or if inter-correlated variables are included in the same model, strong collinearity structures are formed. For example, if temperature (T) and its square (T2) are included in a model as exploratory variables at the same time, the terms T and T2 generate a collinearity pattern. If prediction will be made with a model suffering from multicollinearity for only the points of the parameterization dataset, multicollinearity does not cause serious problems. However, if the aim of modelling is to figure out the process generating dataset of interest, to identify the most suitable model, or to draw inferences from parameter estimates, impact of multicollinearity is serious [23, p. 352]. As the result of multicollinearity, variances of parameter estimates inflate and small changes in observations cause considerable changes in the values of parameter estimates [24, p. 216]. Statistical measures and significance tests based on variances of estimators become unreliable; and hence, some significant variables can appear to be non-significant. Specifically, in a polynomial model, various transformations of exploratory variables can be applied to minimize the effect of multicollinearity [25].

In the comparison of estimation accuracy of existing models, several statistical measures are calculated over parameterisation and validation datasets. Commonly used measures are coefficient of determination (R2), its adjusted version (Radj2), mean percentage error (e), mean bias (MB), mean squared error (MSE), root mean square error (RMSE), relative percentage error (RPE), mean prediction bias (MPB), mean squared prediction error (MSPE), correlation coefficient, amount of toleration, average absolute bias (AAB), average absolute prediction bias (AAPB), average bias (AB), model selection criteria, entropy, one-way analysis of variance (ANOVA), and goodness of fit tests [3], [20], [21], [22]. It is very important to use suitable measures for the comparison of models. Note that R2,Radj2, RMSE, MSE, and ANOVA can be unreliable in the presence of multicollinearity; and hence, the measures based on both variance of estimators and bias are all untrustworthy.

In this article, estimation of the amount of average daily solar radiation over the region between 36° and 42° N latitudes and 26° and 42° E longitudes is taken into account. This region is called “Turkey” throughout the manuscript. Our aim is twofold. First, we would like to attract attention to the multicollinearity issue in solar radiation modelling. Evrendilek and Ertekin [20], [21] identified several models that give accurate estimates for the average daily solar radiation over Turkey. These models were revisited and effects of multicollinearity were figured out and discussed over a dataset of 65 weather stations in Turkey. Appropriate transformations were made on explanatory variables to reduce the impact of multicollinearity, and the models were reapplied over the transformed data set. By this way, more reliable versions of the models are obtained for the estimation of daily solar radiation. However, estimation performances of these models became unsatisfactory after the transformation used to eliminate multicollinearity. Therefore, we need to have a model that is not suffering from multicollinearity, and at the same time, give more precise estimates of global solar radiation than the existing models. Based on the results of multicollinearity analysis, our second aim is to derive a new model including logical and trigonometric terms for the estimation of average daily solar radiation by using the dataset of 65 locations in Turkey. The new model does not suffer from the multicollinearity problem. Estimation accuracy of our model is validated and compared with the previously proposed models. Consequently, it is observed that the new model gives more precise estimates and predictions for the amount of average daily solar radiation than the previously proposed models.

In the second section, the dataset is described, revisited models are illustrated, and statistical measures used to compare and evaluate candidate models are defined. In the third section, the models were fitted to our dataset, and the multicollinearity issue is evaluated and discussed. A new model is proposed for the estimation of solar radiation. Also, the new and existing models are compared and validated in terms of estimation and prediction accuracy. In the fourth section, conclusions are given.

Section snippets

Data description

Our dataset contains measurements of solar radiation at 65 climate stations of the Turkish State Meteorological Service (DMI) between 2000 and 2013. In these stations, solar radiation is recorded hourly by using piranometers calibrated in the Calibration Centre of DMI, which is accredited by The Turkish Accreditation Agency. Recording period for solar radiation measurements is the same for all considered stations. Locations of the stations and quartiles of the distribution of altitudes of sites

Evaluation of existing models

Models 1–5 of Table 3 were fitted to the parameterisation dataset, and the statistical measures given in Table 4 were calculated over both parameterisation and validation datasets. The results are seen in Table 5.

The Radj2 values for the models 1–3 are high over both parameterisation and validation datasets. Values of Radj2 calculated over the validation dataset are smaller than those calculated from parameterization dataset for the models 1–3. However, it is notable that there is no

Conclusion

In this article, models used for the estimation and prediction of the amount of average daily solar radiation are taken into consideration over the region of Turkey. A set of models that give accurate fit for the region of Turkey from the literature is revisited. The multicollinearity problem is frequently seen in the nonlinear or polynomial models used for the estimation of solar radiation. The impacts of multicollinearity on the significance tests of model terms, and model evaluation and

References (34)

Cited by (27)

  • A review on global solar radiation prediction with machine learning models in a comprehensive perspective

    2021, Energy Conversion and Management
    Citation Excerpt :

    In other words, finding the optimal input-combination for prediction models is important, and the relevant process is called feature selection [72]. Feature selection could eliminate unimportant or redundant information and retain the most important features, which reduces the computational cost, improves over-fitting problems [73,74] and improves the multicollinearity problems [75]. The feature selection includes four main procedures: subset generation, subset evaluation, stopping criteria and result validation, shown as Fig. 5.

View all citing articles on Scopus
View full text