Cotton yield prediction with Markov Chain Monte Carlo-based simulation model integrated with genetic programing algorithm: A new hybrid copula-driven approach
Introduction
Timely information on the crop yield is important for agriculture-dependent nations (e.g., Pakistan), as this can generate crucial ideas for agricultural policy making, and forward planners and agricultural markets. Agriculture in Pakistan is known to contribute to about 21% of the county’s GDP (Sarwar, 2014), which include cotton as an important cash crop. This is because cotton is an integral commodity for the economic development of Pakistan as the nation is highly dependent on the cotton industry and its related textile sector due to which the cotton crop has been given a principal status in the country. Cotton crop is grown from May-August as an industrial crop in 15% of the nation's available land area producing 15 million bales during 2014-15 (Reporter, 2015). Pakistan is placed at fourth position among cotton growers, third largest exporter and fourth largest consumer (Banuri, 1998). In 2013, about 1.6 million farmers (out of a total of 5 million in all sectors) engaged in cotton farming, growing more than 3 million hectares (Banuri, 1998; Reporter, 2015).
Data-intelligent models, utilizing past data can offer an accurate solutions to the problems related to the projection of future trends in agriculture, crop yield, rainfall and drought that affects agricultural productivity (Ali et al., 2018a, b; Bauer, 1975; Nguyen-Huy et al., 2017, 2018). Machine learning models, which are highly non-linear models, utilize data that has input features valued for the prediction of crop yield. In the work of Kern et al. (2018), multiple linear regression models were constructed to simulate the yield of the four major crop types in Hungary using environmental and remote sensing information. Moreover, Bokusheva et al. (2016) developed copula models for crop yields on VH indices and Craparo et al. (2015) built an ARIMA model to forecast the decline of coffee yield in Tanzania. Debnath et al. (2013) predicted area and cotton yield in India using an ARIMA model. Blanc et al. (2008) utilized a multiple regression model of the main climatic determinants of rain fed cotton yield in West Africa. Yang et al. (2014) assessed cotton yield and water demand under climate change and future adaptation measures using APSIM-OzCot model. Chen et al. (2011) studied the impact of climate change on cotton production and water consumption using COSIM model in China. Hearn (1994) design a simulation model named OZCOT for cotton crop management in Australia. Papageorgiou et al., (2011) predicted cotton yield using fuzzy cognitive maps in 2011, Greece. Jin and Xu (2012) conducted a study on the estimation of cotton yield using Carnegie Ames Stanford Approach model in China. The aforementioned models were developed to study the climate change impacts on cotton yield prediction.
In summary, existing literature shows that there are few studies in Pakistan that have developed methods for the prediction of cotton yield, despite its relevance as a world leader in cotton production. Ali et al. (2015) used a forecasting ARIMA model for the production of sugarcane and cotton crops of Pakistan from 2013–2030. Hina Ali et al. (2013) also analyzed production forecasting of cotton in Pakistan. Ahmad et al. (2017) developed an ARIMA model to forecast area, production and yield of major crops in Pakistan in 2017. Raza and Ahmad (2015) studied the impact of climate change on cotton productivity in Punjab and Sindh, Pakistan using fixed effect models. Ayaz et al. (2015) studied weather effect on cotton crop in Sindh, Pakistan. Carpio and Ramirez (2002) used yield and acreage models to forecast cotton yield in India, Pakistan and Australia. Ahmad (1975) designed a time series prediction for the supply response of cotton in Punjab, Pakistan in 1975.
All the previous studies indicate that the prediction of cotton yields have been based primarily on the effect of climate change with the adoption of ARIMA model only. In addition to that, all these studies have been conducted for a large area, either for a whole province, or national region, but not for a small locality. Moreover, there is a limitation of applying advanced data-intelligent algorithms for more accurate prediction models at a micro scale which can provide help for decision-making in precision agriculture and farming systems which may be the way future farming trends are analyzed. To address these mentioned issues, there is an apparent need for data intelligent models to predict cotton yield more accurately and at a much finer scale than attempted previously. In this study, for the first time, a hybrid genetic programing integrated with a Markov Chain Monte Carlo (GP-MCMC) based copula model has been developed for the prediction of cotton yield in Faisalabad, Multan and Nawabshah in Pakistan. The novelty of this study is to utilize as yet untested accurate GP-MCMC based copula models for the prediction of cotton yield in Pakistan.
To advance the application of copula models, especially in agriculture where they have been relatively scarcely applied the present study aims to address four primary objectives. (1) To apply GP and MCMC based copula, MCMC based copula models and a standalone GP model to determine which is of these models is the most accurate data-intelligent tool for predicting cotton yield in the developing nation of Pakistan. (2) To model influence of climate dataset (i.e., temperature, rainfall and humidity) to predict effectively the cotton yield in the proposed districts of Punjab and Sindh, the primary agricultural hubs in Pakistan. (3) To develop and optimize the copula-based models by tuning the GP and the MCMC techniques as well as to evaluate their performances in comparison with MCMC based copula and standalone GP model. (4) To validate the predictive ability of each model with respect to cotton yield in Pakistan, making a major contribution to the use of data-driven models for agricultural yield estimation.
Section snippets
Theoretical framework
In this section an overview of the proposed predictive GP-MCMC based copula models with its comparative counterparts, MCMC based copula models and GP are presented.
Materials and method
In this Section, the description of acquired climate and cotton yield data, study regions, design of predictive models and performance criteria have been provided.
Results and discussion
The results of the GP-MCMC based copula model have been compared against MCMC based copula models and a standalone GP model based on the evaluation criterion described above (Eqs. (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21)).
Fig. 6(a–c) demonstrates the joint dependence structure between GP based forecasted cotton yield and observed cotton yield anomalies using MCMC-copula models for the 33-year seasonal dataset. The asymmetric and skewed dependence structure of the
Conclusion
This paper has developed a suite of GP-MCMC based copula models using climate data (temperature, rainfall, humidity) as predictor variables and cotton yield data as an objective variable to predict cotton yield for different geographical sites in Pakistan. To attain an accurate GP-MCMC-copula model, the MCMC algorithm adopted a global optimization technique to find the best copula parameters. Evidently, the performance of the GP-MCMC based copula was found to be much better than the MCMC based
Acknowledgements
This research utilized cotton yield data acquired from the Pakistan Bureau of Statistics, Government of Pakistan: Islamabad, Pakistan and climate data were acquired from Pakistan Meteorological Department, Pakistan, that are duly acknowledged. This study was supported by the University of Southern Queensland’s Office of Graduate Studies Postgraduate Research Scholarship (2017–2019). We thank all reviewers and the journal Editor for their useful comments that have improved the clarity of the
References (87)
- et al.
An ensemble-ANFIS based uncertainty assessment model for forecasting multi-scalar standardized precipitation index
Atmos. Res.
(2018) - et al.
Multi-stage hybridized online sequential extreme learning machine integrated with Markov Chain Monte Carlo copula-Bat algorithm for rainfall forecasting
Atmos. Res.
(2018) The role of remote sensing in determining the distribution and yield of crops
Adv. Agron.
(1975)- et al.
The climatic determinants of cotton yields: evidence from a plot in West Africa
Agric. For. Meteorol.
(2008) - et al.
Satellite-based vegetation health indices as a criteria for insuring against drought-related yield losses
Agric. For. Meteorol.
(2016) - et al.
Coffea arabica yields decline in Tanzania due to climate change: Global implications
Agric. For. Meteorol.
(2015) - et al.
HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environ. Model. Softw.
(2007) - et al.
Comparison of some existing models for estimating global solar radiation for Antalya (Turkey)
Energy Convers. Manag.
(2000) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) Statistical modelling of crop yield in Central Europe using climate data and remote sensing vegetation indices
Agric. For. Meteorol.
(2018)