Elsevier

Agricultural and Forest Meteorology

Volume 263, 15 December 2018, Pages 428-448
Agricultural and Forest Meteorology

Cotton yield prediction with Markov Chain Monte Carlo-based simulation model integrated with genetic programing algorithm: A new hybrid copula-driven approach

https://doi.org/10.1016/j.agrformet.2018.09.002Get rights and content

Highlights

  • Markov Chain Monte Carlo-copula integrated with genetic programing model is designed.

  • GP-MCMC-copula model incorporates climate-based inputs for cotton yield prediction.

  • GP-MCMC copula model is a pertinent data-intelligent tool for prediction of cotton yield.

Abstract

Reliable data-driven models designed to accurately estimate cotton yield, an important agricultural commodity, can be adopted by farmers, agricultural system modelling experts and agricultural policy-makers in strategic decision-making processes. In this paper a hybrid genetic programing model integrated with the Markov Chain Monte Carlo (MCMC) based Copula technique is developed to incorporate climate-based inputs as the predictors of cotton yield, for selected study regions: Faisalabad (31.4504 °N, 73.1350 °E), Multan (30.1984 °N, 71.4687 °E) and Nawabshah (26.2442 °N, 68.4100 °E), as important cotton growing hubs in the developing nation of Pakistan. Several different types of GP-MCMC-copula models were developed, each with the well-known copula families (i.e., Gaussian, student t, Clayton, Gumble Frank and Fischer-Hinzmann functions) to screen and utilize an optimal cotton yield forecast model for the present study region. The results of the GP-MCMC based hybrid copula model were evaluated with a standalone GP and the MCMC based copula model in accordance with statistical analysis of the predicted yield based on correlation coefficient (r), Willmott’s index (WI), Nash-Sutcliffe coefficient (NSE), root mean squared error (RMSE) and mean absolute error (MAE) in the independent test phase. Further performance preciseness was evaluated by the Akiake Information Criterion (AIC), the Bayesian Information Criterion (BIC) and the Maximum Likelihood (MaxL) for the GP-MCMC based copula as well as the MCMC based copula model. GP-MCMC-Clayton copula model generated the most accurate result for the Multan station. For the optimal GP-MCMC-Clayton copula model, the acquired model evaluation metrics for Multan were: (LM≈0.952; RRMSE≈2.107%; RRMAE≈1.771%) followed by the MCMC based Gaussian copula model (LM≈0.895; RRMSE≈4.541%; RRMAE≈0.3.214%) and the standalone GP model (LM≈0.132; RRMSE≈23.638%; RRMAE≈22.652%), indicating the superiority of the GP-MCMC-Clayton copula model in respect to the other benchmark models. The performance of GP-MCMC based copula model was also found to be superior in the case of Faisalabad and Nawabshah station as confirmed by AIC, BIC, MaxL metrics, including a larger value of the Legates-McCabe’s (LM) index, utilized in conjunction with the relative percentage RRMSE and the relative mean absolute error (RMAE). Accordingly, it is averred that the developed GP-MCMC copula model can be considered as a pertinent data-intelligent tool used for accurate prediction of cotton yield, utilizing the readily available climate datasets in agricultural regions and is of relevance to agricultural yield simulation and sectoral decision-making.

Introduction

Timely information on the crop yield is important for agriculture-dependent nations (e.g., Pakistan), as this can generate crucial ideas for agricultural policy making, and forward planners and agricultural markets. Agriculture in Pakistan is known to contribute to about 21% of the county’s GDP (Sarwar, 2014), which include cotton as an important cash crop. This is because cotton is an integral commodity for the economic development of Pakistan as the nation is highly dependent on the cotton industry and its related textile sector due to which the cotton crop has been given a principal status in the country. Cotton crop is grown from May-August as an industrial crop in 15% of the nation's available land area producing 15 million bales during 2014-15 (Reporter, 2015). Pakistan is placed at fourth position among cotton growers, third largest exporter and fourth largest consumer (Banuri, 1998). In 2013, about 1.6 million farmers (out of a total of 5 million in all sectors) engaged in cotton farming, growing more than 3 million hectares (Banuri, 1998; Reporter, 2015).

Data-intelligent models, utilizing past data can offer an accurate solutions to the problems related to the projection of future trends in agriculture, crop yield, rainfall and drought that affects agricultural productivity (Ali et al., 2018a, b; Bauer, 1975; Nguyen-Huy et al., 2017, 2018). Machine learning models, which are highly non-linear models, utilize data that has input features valued for the prediction of crop yield. In the work of Kern et al. (2018), multiple linear regression models were constructed to simulate the yield of the four major crop types in Hungary using environmental and remote sensing information. Moreover, Bokusheva et al. (2016) developed copula models for crop yields on VH indices and Craparo et al. (2015) built an ARIMA model to forecast the decline of coffee yield in Tanzania. Debnath et al. (2013) predicted area and cotton yield in India using an ARIMA model. Blanc et al. (2008) utilized a multiple regression model of the main climatic determinants of rain fed cotton yield in West Africa. Yang et al. (2014) assessed cotton yield and water demand under climate change and future adaptation measures using APSIM-OzCot model. Chen et al. (2011) studied the impact of climate change on cotton production and water consumption using COSIM model in China. Hearn (1994) design a simulation model named OZCOT for cotton crop management in Australia. Papageorgiou et al., (2011) predicted cotton yield using fuzzy cognitive maps in 2011, Greece. Jin and Xu (2012) conducted a study on the estimation of cotton yield using Carnegie Ames Stanford Approach model in China. The aforementioned models were developed to study the climate change impacts on cotton yield prediction.

In summary, existing literature shows that there are few studies in Pakistan that have developed methods for the prediction of cotton yield, despite its relevance as a world leader in cotton production. Ali et al. (2015) used a forecasting ARIMA model for the production of sugarcane and cotton crops of Pakistan from 2013–2030. Hina Ali et al. (2013) also analyzed production forecasting of cotton in Pakistan. Ahmad et al. (2017) developed an ARIMA model to forecast area, production and yield of major crops in Pakistan in 2017. Raza and Ahmad (2015) studied the impact of climate change on cotton productivity in Punjab and Sindh, Pakistan using fixed effect models. Ayaz et al. (2015) studied weather effect on cotton crop in Sindh, Pakistan. Carpio and Ramirez (2002) used yield and acreage models to forecast cotton yield in India, Pakistan and Australia. Ahmad (1975) designed a time series prediction for the supply response of cotton in Punjab, Pakistan in 1975.

All the previous studies indicate that the prediction of cotton yields have been based primarily on the effect of climate change with the adoption of ARIMA model only. In addition to that, all these studies have been conducted for a large area, either for a whole province, or national region, but not for a small locality. Moreover, there is a limitation of applying advanced data-intelligent algorithms for more accurate prediction models at a micro scale which can provide help for decision-making in precision agriculture and farming systems which may be the way future farming trends are analyzed. To address these mentioned issues, there is an apparent need for data intelligent models to predict cotton yield more accurately and at a much finer scale than attempted previously. In this study, for the first time, a hybrid genetic programing integrated with a Markov Chain Monte Carlo (GP-MCMC) based copula model has been developed for the prediction of cotton yield in Faisalabad, Multan and Nawabshah in Pakistan. The novelty of this study is to utilize as yet untested accurate GP-MCMC based copula models for the prediction of cotton yield in Pakistan.

To advance the application of copula models, especially in agriculture where they have been relatively scarcely applied the present study aims to address four primary objectives. (1) To apply GP and MCMC based copula, MCMC based copula models and a standalone GP model to determine which is of these models is the most accurate data-intelligent tool for predicting cotton yield in the developing nation of Pakistan. (2) To model influence of climate dataset (i.e., temperature, rainfall and humidity) to predict effectively the cotton yield in the proposed districts of Punjab and Sindh, the primary agricultural hubs in Pakistan. (3) To develop and optimize the copula-based models by tuning the GP and the MCMC techniques as well as to evaluate their performances in comparison with MCMC based copula and standalone GP model. (4) To validate the predictive ability of each model with respect to cotton yield in Pakistan, making a major contribution to the use of data-driven models for agricultural yield estimation.

Section snippets

Theoretical framework

In this section an overview of the proposed predictive GP-MCMC based copula models with its comparative counterparts, MCMC based copula models and GP are presented.

Materials and method

In this Section, the description of acquired climate and cotton yield data, study regions, design of predictive models and performance criteria have been provided.

Results and discussion

The results of the GP-MCMC based copula model have been compared against MCMC based copula models and a standalone GP model based on the evaluation criterion described above (Eqs. (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21)).

Fig. 6(a–c) demonstrates the joint dependence structure between GP based forecasted cotton yield and observed cotton yield anomalies using MCMC-copula models for the 33-year seasonal dataset. The asymmetric and skewed dependence structure of the

Conclusion

This paper has developed a suite of GP-MCMC based copula models using climate data (temperature, rainfall, humidity) as predictor variables and cotton yield data as an objective variable to predict cotton yield for different geographical sites in Pakistan. To attain an accurate GP-MCMC-copula model, the MCMC algorithm adopted a global optimization technique to find the best copula parameters. Evidently, the performance of the GP-MCMC based copula was found to be much better than the MCMC based

Acknowledgements

This research utilized cotton yield data acquired from the Pakistan Bureau of Statistics, Government of Pakistan: Islamabad, Pakistan and climate data were acquired from Pakistan Meteorological Department, Pakistan, that are duly acknowledged. This study was supported by the University of Southern Queensland’s Office of Graduate Studies Postgraduate Research Scholarship (2017–2019). We thank all reviewers and the journal Editor for their useful comments that have improved the clarity of the

References (87)

  • K. Mohammadi

    A new hybrid support vector machine–wavelet transform approach for estimation of horizontal global solar radiation

    Energy Convers. Manag.

    (2015)
  • J.E. Nash et al.

    River flow forecasting through conceptual models part I—a discussion of principles

    J. Hydrol. (Amst)

    (1970)
  • T. Nguyen-Huy et al.

    Copula-statistical precipitation forecasting model in Australia’s agro-ecological zones

    Agric. Water Manag.

    (2017)
  • T. Nguyen-Huy et al.

    Modeling the joint influence of multiple synoptic-scale, climate mode indices on Australian wheat yield using a vine copula-based approach

    Eur. J. Agron.

    (2018)
  • E.I. Papageorgiou et al.

    Fuzzy cognitive map based approach for predicting yield in cotton crop production as a basis for decision support system in precision agriculture application

    Appl. Softw. Comput.

    (2011)
  • V.O. Snow

    The challenges–and some solutions–to process-based modelling of grazed agricultural systems

    Environ. Model. Softw.

    (2014)
  • Y. Yang et al.

    Prediction of cotton yield and water demand under climate change and future adaptation measures

    Agric. Water Manag.

    (2014)
  • X. Yuan et al.

    Wind power prediction using hybrid autoregressive fractionally integrated moving average and least square support vector machine

    Energy

    (2017)
  • Z. Zhisheng

    Quantum-behaved particle swarm optimization algorithm for economic load dispatch of power system

    Expert Syst. Appl.

    (2010)
  • B. Ahmad

    Supply response of cotton in Punjab: a time series analysis

    Pak. Cottons

    (1975)
  • D. Ahmad et al.

    Major crops forecasting area, production and yield evidence from agriculture sector of Pakistan

    Sarhad J. Agric.

    (2017)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Trans. Automat. Contr.

    (1974)
  • S. Ali et al.

    Forecasting production and yield of sugarcane and cotton crops of Pakistan for 2013-2030

    Sarhad J. Agric.

    (2015)
  • C. Andrieu et al.

    A tutorial on adaptive MCMC

    Stat. Comput.

    (2008)
  • M. Ayaz et al.
    (2015)
  • T. Banuri

    Pakistan: Environmental Impact of Cotton Production and Trade

    (1998)
  • J. Briggs et al.

    Turbulent Mirror: an Illustrated Guide to Chaos Theory and the Science of Wholeness

    (1989)
  • C.E. Carpio et al.

    Forecasting foreign Cotton production: the case of India, Pakistan and Australia. Paper presented and published

    The Proceedings of The 2002 Beltwide Cotton Conference

    (2002)
  • G.-c. Chen et al.

    Particle swarm optimization algorithm

    Inf. Control Shenyang

    (2005)
  • C. Chen et al.

    Impact of climate change on cotton production and water consumption in Shiyang River Basin

    Trans. Chin. Soc. Agric. Eng.

    (2011)
  • D.G. Clayton

    A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence

    Biometrika

    (1978)
  • R.-G. Cong et al.

    The interdependence between rainfall and temperature: copula analyses

    Sci. World J.

    (2012)
  • C. Cortes et al.

    Support vector machine

    Mach. Learn.

    (1995)
  • L. Davis

    Handbook of Genetic Algorithms

    (1991)
  • L. De Lathauwer et al.

    Singular Value Decomposition, Proc. EUSIPCO-94

    (1994)
  • M. Debnath et al.

    Forecasting area, production and yield of cotton in India using ARIMA model

    Res. Rev. J. Space Sci. Technol.

    (2013)
  • R.C. Deo et al.

    Forecasting evaporative loss by least-square support-vector regression and evaluation with genetic programming, Gaussian process, and minimax probability machine regression: case study of Brisbane City

    J. Hydrol. Eng.

    (2017)
  • P.M. Department

    Dry Weather Predicted in the Country During Friday/Monday

    (2010)
  • T.G. Dietterich

    Ensemble learning

    (2002)
  • C.a.a.p.b Districts

    Crops Area and Production by Districts 1981-2008

    (2008)
  • N.R. Draper et al.

    Applied Regression Analysis

    (2014)
  • Q. Duan et al.

    Shuffled complex evolution approach for effective and efficient global minimization

    J. Opt. Theory Appl.

    (1993)
  • M.J. Fischer et al.

    A New Class of Copulas With Tail Dependence and a Generalized Tail Dependence Estimator

    (2006)
  • Cited by (0)

    View full text