The problem of multicollinearity in horizontal solar radiation estimation models and a new model for Turkey
Introduction
Solar energy is an important renewable energy source. It has essential effects on environmental processes. Therefore, it is directly related with indicators of development of a country, such as cereal yield, crop production, forest area, gas emissions, and energy production. Because it is possible to convert solar energy to electricity without contributing the increase in climate change or damaging water resources, it seems more beneficial than the other methods of electricity generation and attracts great attention [1]. Accurate estimation of the amount of solar radiation absorbed or reflected by certain points over the Earth is an appealing research area. Through the article, we refer “horizontal global solar radiation” as “solar radiation” unless otherwise is stated.
Scientists use various statistical modelling tools for estimation of the amount of solar radiation, and compare proposed models to identify the most accurate one. Because there are many meteorological and geographical variables, terrestrial factors, and many types of models that can be used in solar energy modelling, identification of an accurate model is a very complex task. Some of those variables are temperature, soil temperature, precipitation, relative humidity, cloudiness, cloud types, sunshine duration, longitude, latitude, altitude, evapotranspiration, the Earth’s distance from the sun, solar elevation angle, aerosol concentration in the atmosphere, ozone concentration, air pollution, and the ground albedo [2]. Linear, trigonometric, polynomial, or logical models with main effect and interaction terms can be used to model the amount of solar radiation reaching at a particular point over the Earth. A group of models includes large scale solar energy atlases such as European Solar Radiation Atlas (ESRA), the Solar Radiation Potential Atlas (GEPA), and the TEKNOLOGIS (see Demirhan et al. [3] for details). Each of these atlases has a specific estimation model and an algorithm, and covers large regions. Another group of models has been developed for a particular region of the Earth or for a limited time span. There are a lot of models for some cities of Turkey. Dinçer et al. [4] proposed a model for the site of Gebze, Turkey. Togrul and Onat [5] introduced a model for the estimation of solar radiation for Elazig, Turkey. Sen and Sahin [6] proposed a cumulative semivariogram approach for the month of January over 29 cities of Turkey. Saylan et al. [7] presented solar radiation estimates for three cities of Turkey. Sozen et al. [8] gave solar radiation estimates for 12 cities of Turkey by using the artificial neural networks (ANN). Sozen et al. [9], [10] proposed other models for Turkey over 17 meteorological stations using ANN. Sozen et al. [11] gave a modelling strategy over data of 18 cities of Turkey. Menges et al. [12] proposed a model for Konya, Turkey. Senkal and Kuleli [13] introduced a model for 12 cities of Turkey. Senkal [14] focused on nine cities of Turkey. Senkal et al. [15] gave solar radiation estimates for two cities of Turkey. Koca et al. [16] proposed a model for seven cities from the Mediterranean region of Turkey. Some models were proposed for specific countries. Jin et al. [17] proposed a generic model for the monthly average daily solar radiation for China. Ozgoren et al. [18] developed a model to estimate monthly mean daily sum solar radiation for Turkey. Khorasanizadeh and Mohammadi [19] focused on the region of Iran. It is possible to apply a model or modelling strategy developed for a particular region to another region of the Earth. Ertekin and Evrendilek [20] applied 18 existing models to estimate average daily solar radiation for Turkey. In an extensive and valuable work, Evrendilek and Ertekin [21] applied existing 78 models, in which 17 variables are considered, for the region of Turkey. This work also provides a comprehensive review of the models proposed for the estimation of the average daily horizontal global solar radiation. Evrendilek and Ertekin [21] identified successful models in terms of estimation accuracy for the region of Turkey. Sonmete et al. [22] considered existing 82 models for two cities of Turkey in a comparative case study.
Statistical modelling is a mechanical work. Almost each kind of model has its own assumptions. Violations of model assumptions are effectual on the significance tests of model components, accuracy of parameter estimates, and model selection tools. Distributional assumptions, influential observations, and possible multicollinearity structures between exploratory variables should be regarded for a successful modelling task. The main problem in solar radiation estimation models is the multicollinearity. When a variable is seen in a model more than once as in polynomial models, or if inter-correlated variables are included in the same model, strong collinearity structures are formed. For example, if temperature (T) and its square (T2) are included in a model as exploratory variables at the same time, the terms T and T2 generate a collinearity pattern. If prediction will be made with a model suffering from multicollinearity for only the points of the parameterization dataset, multicollinearity does not cause serious problems. However, if the aim of modelling is to figure out the process generating dataset of interest, to identify the most suitable model, or to draw inferences from parameter estimates, impact of multicollinearity is serious [23, p. 352]. As the result of multicollinearity, variances of parameter estimates inflate and small changes in observations cause considerable changes in the values of parameter estimates [24, p. 216]. Statistical measures and significance tests based on variances of estimators become unreliable; and hence, some significant variables can appear to be non-significant. Specifically, in a polynomial model, various transformations of exploratory variables can be applied to minimize the effect of multicollinearity [25].
In the comparison of estimation accuracy of existing models, several statistical measures are calculated over parameterisation and validation datasets. Commonly used measures are coefficient of determination (R2), its adjusted version , mean percentage error (e), mean bias (MB), mean squared error (MSE), root mean square error (RMSE), relative percentage error (RPE), mean prediction bias (MPB), mean squared prediction error (MSPE), correlation coefficient, amount of toleration, average absolute bias (AAB), average absolute prediction bias (AAPB), average bias (AB), model selection criteria, entropy, one-way analysis of variance (ANOVA), and goodness of fit tests [3], [20], [21], [22]. It is very important to use suitable measures for the comparison of models. Note that , RMSE, MSE, and ANOVA can be unreliable in the presence of multicollinearity; and hence, the measures based on both variance of estimators and bias are all untrustworthy.
In this article, estimation of the amount of average daily solar radiation over the region between 36° and 42° N latitudes and 26° and 42° E longitudes is taken into account. This region is called “Turkey” throughout the manuscript. Our aim is twofold. First, we would like to attract attention to the multicollinearity issue in solar radiation modelling. Evrendilek and Ertekin [20], [21] identified several models that give accurate estimates for the average daily solar radiation over Turkey. These models were revisited and effects of multicollinearity were figured out and discussed over a dataset of 65 weather stations in Turkey. Appropriate transformations were made on explanatory variables to reduce the impact of multicollinearity, and the models were reapplied over the transformed data set. By this way, more reliable versions of the models are obtained for the estimation of daily solar radiation. However, estimation performances of these models became unsatisfactory after the transformation used to eliminate multicollinearity. Therefore, we need to have a model that is not suffering from multicollinearity, and at the same time, give more precise estimates of global solar radiation than the existing models. Based on the results of multicollinearity analysis, our second aim is to derive a new model including logical and trigonometric terms for the estimation of average daily solar radiation by using the dataset of 65 locations in Turkey. The new model does not suffer from the multicollinearity problem. Estimation accuracy of our model is validated and compared with the previously proposed models. Consequently, it is observed that the new model gives more precise estimates and predictions for the amount of average daily solar radiation than the previously proposed models.
In the second section, the dataset is described, revisited models are illustrated, and statistical measures used to compare and evaluate candidate models are defined. In the third section, the models were fitted to our dataset, and the multicollinearity issue is evaluated and discussed. A new model is proposed for the estimation of solar radiation. Also, the new and existing models are compared and validated in terms of estimation and prediction accuracy. In the fourth section, conclusions are given.
Section snippets
Data description
Our dataset contains measurements of solar radiation at 65 climate stations of the Turkish State Meteorological Service (DMI) between 2000 and 2013. In these stations, solar radiation is recorded hourly by using piranometers calibrated in the Calibration Centre of DMI, which is accredited by The Turkish Accreditation Agency. Recording period for solar radiation measurements is the same for all considered stations. Locations of the stations and quartiles of the distribution of altitudes of sites
Evaluation of existing models
Models 1–5 of Table 3 were fitted to the parameterisation dataset, and the statistical measures given in Table 4 were calculated over both parameterisation and validation datasets. The results are seen in Table 5.
The values for the models 1–3 are high over both parameterisation and validation datasets. Values of calculated over the validation dataset are smaller than those calculated from parameterization dataset for the models 1–3. However, it is notable that there is no
Conclusion
In this article, models used for the estimation and prediction of the amount of average daily solar radiation are taken into consideration over the region of Turkey. A set of models that give accurate fit for the region of Turkey from the literature is revisited. The multicollinearity problem is frequently seen in the nonlinear or polynomial models used for the estimation of solar radiation. The impacts of multicollinearity on the significance tests of model terms, and model evaluation and
References (34)
- et al.
Renewable energy potential and utilization in Turkey
Energy Convers Manage
(2003) - et al.
Statistical comparison of global solar radiation estimation models over Turkey
Energy Convers Manage
(2013) - et al.
A simple technique for estimating solar radiation parameters and its application for Gebze
Energy Convers Manage
(1996) - et al.
Spatial interpolation and estimation of solar irradiation by cumulative semivariograms
Sol Energy
(2001) - et al.
Solar energy potential for heating cooling systems in big cities of Turkey
Energy Convers Manage
(2002) - et al.
Estimation of solar radiation in Turkey by artificial neural network using meteorological and geographical data
Energy Convers Manage
(2004) - et al.
Use of artificial neural networks for mapping of solar potential in Turkey
Appl Energy
(2004) - et al.
Solar energy potential in Turkey
Appl Energy
(2005) - et al.
Evaluation of global solar radiation models for Konya, Turkey
Energy Convers Manage
(2006) - et al.
Estimation of solar radiation over Turkey using artificial neural network and satellite data
Appl Energy
(2009)
Modeling of solar radiation using remote sensing and artificial neural network in Turkey
Energy
General formula for estimation of monthly average daily global solar radiation in China
Energy Convers Manage
Estimation of global solar radiation using ANN over Turkey
Expert Syst Appl
Prediction of daily global solar radiation by day of the year in four cities located in the sunny regions of Iran
Energy Convers Manage
Spatio-temporal modelling of global solar radiation dynamics as a function of sunshine duration for Turkey
Agric Forest Meteorol
A model comparison for daylength as a function of latitude and day of year
Ecol Model
Trends of the global radiation and sunshine hours in 1961–1998 and their relationships in China
Energy Convers Manage
Cited by (27)
Robust wind speed estimation with modified fuzzy regression functions with a noise cluster
2022, Energy Conversion and ManagementHigh-resolution assessment of solar radiation and energy potential in China
2021, Energy Conversion and ManagementA review on global solar radiation prediction with machine learning models in a comprehensive perspective
2021, Energy Conversion and ManagementCitation Excerpt :In other words, finding the optimal input-combination for prediction models is important, and the relevant process is called feature selection [72]. Feature selection could eliminate unimportant or redundant information and retain the most important features, which reduces the computational cost, improves over-fitting problems [73,74] and improves the multicollinearity problems [75]. The feature selection includes four main procedures: subset generation, subset evaluation, stopping criteria and result validation, shown as Fig. 5.
Impact of increasing temperature anomalies and carbon dioxide emissions on wheat production
2020, Science of the Total EnvironmentPerformance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach
2020, Sustainable Energy Technologies and Assessments