Elsevier

Atmospheric Research

Volume 212, 1 November 2018, Pages 240-258
Atmospheric Research

Statistical downscaling of precipitation using machine learning techniques

https://doi.org/10.1016/j.atmosres.2018.05.022Get rights and content

Highlights

  • Polynomial kernel is suitable with Support Vector and Relevance Vector Machines.

  • Relevance Vector Machine is recommended for drought analysis.

  • Relevance Vector Machine and Neural Networks are recommended for flood analysis.

Abstract

Statistical models were developed for downscaling reanalysis data to monthly precipitation at 48 observation stations scattered across the Australian State of Victoria belonging to wet, intermediate and dry climate regimes. Downscaling models were calibrated over the period 1950–1991 and validated over the period 1992–2014 for each calendar month, for each station, using 4 machine learning techniques, (1) Genetic Programming (GP), (2) Artificial Neural Networks (ANNs), (3) Support Vector Machine (SVM), and (4) Relevance Vector Machine (RVM). It was found that, irrespective of the climate regime and the machine learning technique, downscaling models tend to better simulate the average (compared to other statistics) and under-estimate the standard deviation and the maximum of the observed precipitation. Also, irrespective of the climate regime and the machine learning technique, at the majority of stations downscaling models showed an over-estimating trend of low to mid percentiles (i.e. below the 50th percentile) of precipitation and under-estimating trend of high percentiles of precipitation (i.e. above the 90th percentile). The over-estimating trend of low to mid percentiles of precipitation was more pronounced at stations located in dryer climate, irrespective of the machine learning technique. Based on the results of this investigation the use of RVM or ANN over SVM or GP for developing downscaling models can be recommended for a study such as flood prediction which involves the consideration of high extremes of precipitation. Also, RVM can be recommended over GP, ANN or SVM in developing downscaling models for a study such as drought analysis which involves the consideration of low extremes of precipitation. Furthermore, it was found that irrespective of the climate regime, the SVM and RVM-based precipitation downscaling models showed the best performance with the Polynomial kernel.

Introduction

The assessment of water resources in catchments under changing climate is important as the spatial and temporal variability of water resources is highly influenced by the changes in the climate (Pascual et al., 2015). General Circulation Models (GCMs) are considered as the most advanced tools available for obtaining global scale climate change projections of hydroclimatic variables (Bates et al., 2010). GCMs are forced with likely future GHG emission scenarios in order to produce scenarios of global climate likely to occur in the future. Owing to the coarse spatial scale at which GCMs operate, they are unable to resolve sub-grid scale processes such as cloud physics and land surface processes, and also the topography of the Earth is coarsely represented within the structure of GCMs (Iorio et al., 2004). Therefore, projections of GCMs cannot be readily used in catchment scale applications such as hydrologic modelling or water resources allocation modelling.

In order to bridge the spatial scale gap between the coarse scale GCM outputs and catchment scale hydroclimatic variables, statistical and dynamic downscaling approaches have been developed (Wilby and Wigley, 1997). In statistical downscaling, empirical statistical relationships between GCM outputs and catchment scale hydroclimatic variables are developed to bridge the spatial scale gap between GCM outputs and catchment scale hydroclimatic variables (Benestad et al., 2008). In dynamic downscaling, physics based equations are used for the same purpose (Fowler and Wilby, 2010). Statistical downscaling has gained wide popularity due to its low computational cost and simplicity (Okkan and Inan, 2014; Rashid et al., 2015; Sachindra et al., 2016), compared to its counterpart dynamic downscaling.

According to Wilby et al. (2004), statistical downscaling approaches can be further sub-divided into three categories; regression-based approaches, weather classification-based approaches and approaches based on weather generators. Regression-based statistical downscaling approaches have gained popularity out of the above three categories owing to their simplicity in application. The regression techniques widely used in statistical downscaling include Multi Linear Regression (MLR) (Sachindra et al., 2014a), Generalized Linear Models (GLMs) (Beecham et al., 2014), Artificial Neural Networks (ANNs) (Tripathi et al., 2006; Ahmed et al., 2015), Support Vector Machine (SVM) (Sachindra et al., 2013; Goly et al., 2014), Relevance Vector Machine (RVM) (Ghosh and Mujumdar, 2008; Okkan and Inan, 2014), Genetic Programming (GP) (Coulibaly, 2004; Sachindra et al., 2018) and Gene Expression Programming (GEP) (Hashmi et al., 2011; Sachindra et al., 2016). Owing to the learning abilities from data and their use in computer algorithms, techniques such as ANN, SVM, RVM, and GP are often called machine learning techniques.

In the past literature, studies have been documented on the comparison of performance of different downscaling approaches developed with machine learning techniques and traditional statistical techniques. Some examples for such studies are provided in this paragraph. In a downscaling exercise, Coulibaly (2004) found that GP-based downscaling models were able to better simulate both daily minimum and maximum temperature in comparison to that by MLR-based downscaling models. In a streamflow downscaling study, Sachindra et al. (2013) discovered that a Least Square Support Vector Machine (LSSVM) based downscaling model was able to better capture the observed streamflow in comparison to that by a MLR-based model. Duhan and Pandey (2015) found that, SVM-based downscaling models are able to better perform in simulating the observed monthly maximum and minimum temperature in comparison to that by ANN and MLR-based models. Goly et al. (2014) employed MLR, positive coefficient regression (PCR), stepwise regression (SR), and SVM for downscaling large scale atmospheric variables to monthly precipitation, and concluded that SVM-based downscaling models outperform models developed with all other techniques in simulating statistics of monthly observed precipitation. According to above studies, downscaling models developed with machine learning techniques perform better in comparison to downscaling models developed with traditional statistical regression techniques.

Though downscaling literature contains the details of many comparison studies of models developed with various techniques, the literature lacks the details of a single study which assesses the performances of models developed using machine learning techniques; GP, ANN, SVM and RVM for downscaling large scale atmospheric information to catchment scale precipitation under diverse climate (relatively wet, intermediate and relatively dry). Also, the current literature does not contain a detailed investigation on the selection of a suitable kernel in the application of SVM and RVM techniques in developing downscaling models. This paper is dedicated to the assessment of the effectiveness of the use of GP, ANN, SVM and RVM in the development of models for downscaling large scale atmospheric information to catchment scale monthly precipitation under diverse climate. In addition to that, this paper presents an investigation on the assessment of suitability of number of different kernel functions in SVM and RVM-based downscaling models under diverse climate.

Section snippets

Study area and data

For the case study, 48 precipitation observation stations located across Victoria (237,000 km2), Australia were selected. These precipitation observation stations were selected in such a way that they contain records of observations over the period 1950–2014 with the minimum missing data and they represent relatively wet, intermediate and relatively dry climate regimes. Names of the precipitation observations stations, their locations along with the long-term statistics of observed

Methodology

Sub-section 3.1 provides the details of the theory of machine learning techniques used in this study, and Sub-section 3.2 details the application of the machine learning techniques in the development of downscaling models.

Results of selection of kernels for SVM and RVM-based downscaling models

It was observed that the kernels used in the SVM and RVM-based downscaling models which produced the best performance in calibration in terms of RMSE (the best kernel), varied from one calendar month to another, even at the same station. Table 4 shows the percentage of selection of a given kernel as the best in relatively wet, intermediate and relatively dry climate regimes for SVM and RVM. The percentage of selection of a kernel as the best for a given climate regime was calculated using Eq.

Differences in bias in the statistics of modelled precipitation in calibration and validation

It was seen that there are noticeable differences in the percentages of bias in the statistics of precipitation simulated by the downscaling models in the calibration and validation periods, at certain stations. It is natural for a downscaling model to display a relatively larger bias percentage in validation in comparison to that in calibration. This is because during calibration, the model parameters are optimised (values of parameters are allowed to change freely) in order to achieve the

Conclusions

Following conclusions were drawn from this investigation;

Irrespective of the climate regime and the machine learning technique, in both calibration and validation at the majority of the stations downscaling models showed an over-estimating trend of low to mid percentiles (i.e. below the 50th percentile) of precipitation and under-estimating trend of high percentiles of precipitation (i.e. above the 90th percentile). The over-estimating trend of low to mid percentiles and under-estimating trend

References (90)

  • M.Z. Hashmi et al.

    Statistical downscaling of watershed precipitation using Gene Expression Programming (GEP)

    Environ. Model Softw.

    (2011)
  • Z. He et al.

    A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region

    J. Hydrol.

    (2014)
  • C.L. Huang et al.

    A GA-based feature selection and parameters optimization for support vector machines

    Expert Syst. Appl.

    (2006)
  • D. Joshi et al.

    Databased comparison of sparse Bayesian learning and multiple linear regression for statistical downscaling of low flow indices

    J. Hydrol.

    (2013)
  • S. Kouhestani et al.

    Projection of climate change impacts on precipitation using soft-computing techniques: a case study in Zayandeh-rud Basin, Iran

    Glob. Planet. Chang.

    (2016)
  • C. Leys et al.

    Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median

    J. Exp. Soc. Psychol.

    (2013)
  • F. Mekanik et al.

    Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

    J. Hydrol.

    (2013)
  • H. Meyer et al.

    Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    Atmos. Res.

    (2016)
  • V. Nourani et al.

    A combined neural-wavelet model for prediction of Ligvanchai watershed precipitation

    Eng. Appl. Artif. Intell.

    (2009)
  • K. Rasouli et al.

    Daily streamflow forecasting by machine learning methods with weather and climate inputs

    J. Hydrol.

    (2012)
  • M.S. Roodposhti et al.

    Drought sensitivity mapping using two one-class support vector machine algorithms

    Atmos. Res.

    (2017)
  • A. Sarhadi et al.

    Water resources climate change projections using supervised nonlinear and multivariate soft computing techniques

    J. Hydrol.

    (2016)
  • B. Selle et al.

    Testing the structure of a hydrological model using genetic programming

    J. Hydrol.

    (2011)
  • M.S. Tehrany et al.

    Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS

    J. Hydrol.

    (2014)
  • B. Timbal et al.

    Generalization of a statistical downscaling model to provide local climate change projections for Australia

    Environ. Model. Softw.

    (2009)
  • S. Tripathi et al.

    Downscaling of precipitation for climate change scenarios: a support vector machine approach

    J. Hydrol.

    (2006)
  • P.A. Whigham et al.

    Modelling rainfall-runoff using genetic programming

    Math. Comput. Model.

    (2001)
  • B. Yadav et al.

    Discharge forecasting using an online sequential extreme learning machine (OS-ELM) model: a case study in Neckar River, Germany

    Measurement

    (2016)
  • Z.M. Yaseen et al.

    Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq

    J. Hydrol.

    (2016)
  • K. Ahmed et al.

    Multilayer perceptron neural network for downscaling rainfall in arid region: a case study of Baluchistan, Pakistan

    J. Earth Syst. Sci.

    (2015)
  • A. Anandhi et al.

    Downscaling precipitation to river basin in India for IPCC SRES scenarios using support vector machine

    Int. J. Climatol.

    (2008)
  • A. Anandhi et al.

    Role of predictors in downscaling surface temperature to river basin in India for IPCC SRES scenarios using support vector machine

    Int. J. Climatol.

    (2009)
  • B.C. Bates et al.

    Incorporating Climate Change in Water Allocation Planning, Waterlines Report

    (2010)
  • S. Beecham et al.

    Statistical downscaling of multi-site daily rainfall in a south Australian catchment using a generalized linear model

    Int. J. Climatol.

    (2014)
  • R. Benestad et al.

    Empirical-Statistical Downscaling

    (2008)
  • S.S. Brands et al.

    On the use of reanalysis data for downscaling

    J. Clim.

    (2012)
  • Bureau of Meteorology
  • L. Campozano et al.

    Comparison of statistical downscaling methods for monthly total precipitation: case study for the Paute River basin in southern Ecuador

    Adv. Meteorol.

    (2016)
  • G.P. Compo et al.

    The twentieth century reanalysis project

    Q. J. Roy. Meteorol. Soc.

    (2011)
  • P. Coulibaly

    Downscaling daily extreme temperatures with genetic programming

    Geophys. Res. Lett.

    (2004)
  • R.C. Deo et al.

    Estimation of monthly evaporative loss using relevance vector machine, extreme learning machine and multivariate adaptive regression spline models

    Stoch. Env. Res. Risk A.

    (2016)
  • D. Duhan et al.

    Statistical downscaling of temperature using three techniques in the Tons River basin in Central India

    Theor. Appl. Climatol.

    (2015)
  • H.J. Fowler et al.

    Detecting changes in seasonal precipitation extremes using regional climate model projections: implications for managing fluvial flood risk

    Water Resour. Res.

    (2010)
  • S. Ghosh

    SVM-PGSL coupled approach for statistical downscaling to predict rainfall from GCM output

    J. Geophys. Res.

    (2010)
  • A. Goly et al.

    Development and evaluation of statistical downscaling models for monthly precipitation

    Earth Interact.

    (2014)
  • Cited by (194)

    • Downscaling of environmental indicators: A review

      2024, Science of the Total Environment
    View all citing articles on Scopus
    View full text