Elsevier

Geoderma

Volume 266, 15 March 2016, Pages 98-110
Geoderma

Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran

https://doi.org/10.1016/j.geoderma.2015.12.003Get rights and content

Highlights

  • Soil organic carbon was digitally mapped at a fine resolution in Iran.

  • Different data mining techniques were compared.

  • Soil depth function was applied to model SOC up to 1-m.

Abstract

This study aimed to map SOC lateral, and vertical variations down to 1 m depth in a semi-arid region in Kurdistan Province, Iran. Six data mining techniques namely; artificial neural networks, support vector regression, k-nearest neighbor, random forests, regression tree models, and genetic programming were combined with equal-area smoothing splines to develop, evaluate and compare their effectiveness in achieving this aim. Using the conditioned Latin hypercube sampling method, 188 soil profiles in the study area were sampled and soil organic carbon content (SOC) measured. Eighteen ancillary data variables derived from a digital elevation model and Landsat 8 images were used to represent predictive soil forming factors in this study area. Findings showed that normalized difference vegetation index and wetness index were the most useful ancillary data for SOC mapping in the upper (0–15 cm) and bottom (60–100 cm) of soil profiles, respectively. According to 5-fold cross-validation, artificial neural networks (ANN) showed the highest performance for prediction of SOC in the four standard depths compared to all other data mining techniques. ANNs resulted in the lowest root mean square error and highest Lin's concordance coefficient which ranged from 0.07 to 0.20 log (kg/m3) and 0.68 to 0.41, respectively, with the first value in each range being for the top of the profile and second for the bottom. Furthermore, ANNs increased performance of spatial prediction compared to the other data mining algorithms by up to 36, 23, 21 and 13% for each soil depth, respectively, starting from the top of the profile. Overall, results showed that prediction of subsurface SOC variation needs improvement and the challenge remains to find appropriate covariates that can explain it.

Introduction

Baneh area located in Kurdistan province, Iran has suffered from de-forestation in recent decades due to population growth. Forest areas were cleared to create land for cultivation to feed the growing population, which caused land degradation. SOC maps are useful for several reasons, namely; increasing crop production, land degradation management and designing an effective C sequestration program for the area. However, there are no high-resolution maps which describe SOC in the topsoil and subsoil for Iran. Conventional soil mapping techniques have been criticized in the scientific literature for being subjective and qualitative in character, where soil maps are developed based on a mental model developed by the soil surveyors (Taghizadeh-Mehrjardi et al., 2015). Such qualitative maps, while helpful, can lead to ill-informed management decisions. To overcome these problems, the application of digital soil mapping (DSM) techniques could be an efficient alternative approach. In DSM, soil properties are mapped digitally based on their relationship with cheaper-to-measure ancillary data (McBratney et al., 2003). Previous studies indicated that digital elevation models (DEM) and remotely sensed data are the most common ancillary data for SOC prediction (Malone et al., 2009, Mulder et al., 2011, Minasny et al., 2013, Dai et al., 2014, Were et al., 2015).

Numerous prediction methods have been developed and introduced to correlate ancillary variables and soil organic carbon (SOC) through the DSM framework proposed by McBratney et al. (2003). Minasny et al. (2013) give a comprehensive review of SOC modeling. Most commonly, multiple and linear regression have been used for relating SOC to ancillary variables (Hengl et al., 2015). The later technique is simple in application and easy in interpretation. Fewer studies used generalized linear models (Karunaratne et al., 2014), regression tree models (Martin et al., 2011), random forest (Were et al., 2015, Hengl et al., 2015), artificial neural networks (Malone et al., 2009, Dai et al., 2014), support vector regression (Were et al., 2015), k-nearest neighbor (Mansuy et al., 2014) or genetic programming to construct the relationships between SOC content and other ancillary variables. However, such modeling techniques have the potential for detecting non-linear relationships and might therefore prove more powerful for digital SOC mapping. Unfortunately, a major drawback of these machine learning approaches is that they only show SOC spatial variability mapped at specified depths or a combination of depth intervals while SOC generally varies continuously within a typical soil profile. Soil carbon has been observed to decline rapidly with depth (Minasny et al., 2013). Therefore, this variation can be modeled using continuous soil depth functions (Malone et al., 2009) to create a 3D map, describing vertical and lateral variation of SOC. Many attempts have been made to derive some functions of soil variation with depth (Mishra et al., 2009, Kempen et al., 2011). However, Bishop et al. (1999) suggested that equal-area quadratic splines are more flexible and practicable depth functions compared to other methods.

With regard to the potential of soil depth functions (Malone et al., 2009) and the capabilities of digital soil mapping (Mulder et al., 2011), the only way to predict lateral and vertical variation of soil properties seems to be a combination of both methods. So this paper aims to predict spatial SOC variation using different digital soil mapping techniques (i.e. artificial neural network, support vector regression, k-nearest neighbor, random forest, regression tree model, and genetic programming) together with a depth function (the equal-area smoothing spline) in a semi-arid area of Iran.

Section snippets

Study area

The study area is located in Kurdistan Province, about 12 km northwest of Baneh, Iran (Fig. 1). It lies between the latitudes of 36.01 and 36.08 ° North and the longitudes of 45.66 and 45.83° East and covers 3000 ha. The climate is semiarid with distinct differences between dry (July–September) and wet (Oct–May) seasons. Average annual rainfall and temperature are 700 mm and 13.8 °C, respectively. Soil moisture and temperature regimes are Xeric and Mesic, respectively. The geomorphologic units

Data summary of SOC

The raw carbon data displayed a log-normal distribution and subsequently was log-transformed prior to fitting the splines. Although the spline function is a non-linear equation and it doesn't need any transformation, Malone et al. (2009) used log-transformed data for enhancing SOC prediction performances. To test this assumption, we fitted spline function to raw and log-transformed values of SOC in the soil profiles. In terms of correlation between measured and predicted SOC, the best

Conclusion

As soil organic carbon is one of the most important soil properties, significant efforts have been made in the past to improve the accuracy of SOC estimation using statistical techniques. This paper has investigated SOC vertical and lateral variations up to a 1-m spatial resolution in a semi-arid region of Iran. Here, using a soil data base and a suit of ancillary variables six data-mining techniques were compared for each of the standardized depths: 0–15; 15–30; 30–60 and 60–100-cm. The

References (38)

  • B. Minasny et al.

    A conditioned Latin hypercube method for sampling in the presence of ancillary information

    Comput. Geosci.

    (2006)
  • B. Minasny et al.

    Digital mapping of soil carbon

    Adv. Agron.

    (2013)
  • V.L. Mulder et al.

    The use of remote sensing in soil and terrain mapping—a review

    Geoderma

    (2011)
  • R. Taghizadeh-Mehrjardi et al.

    Digital mapping of soil salinity in Ardakan region, central Iran

    Geoderma

    (2014)
  • R. Taghizadeh-Mehrjardi et al.

    Comparing data mining classifie rs to predict spatial distribution of USDA-family soil groups in Baneh region, Iran

    Geoderma

    (2015)
  • R.A. Welikala et al.

    Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy

    Comput. Med. Imaging Graphs.

    (2015)
  • K. Were et al.

    A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape

    Ecol. Indic.

    (2015)
  • Z. Cheng-Ping et al.

    Research on hydrology time series prediction based on grey theory and e-support vector regression

  • P.F. Dai et al.

    Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau

    Ecol. Indic.

    (2014)
  • Cited by (190)

    View all citing articles on Scopus
    View full text